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ABSTRACT 


One of the main proL'lems in a multistage decision tree 
procedure is predicting the optical features to be used at 
every node. An algorithm is proposes which predicts the 
optima' .‘cctures at every node in a binary tree procedure. 
The algorithm estimates the probability of error by approxi- 
mating the area under the likelihood ratio function for two 
classes, and taking into account the number of training sam- 
ples used in estimating each of these two classes. Some 
results on feature selection techniques, particularly in the 
presence of a very limited set of training samples are pre- 
sented. Results comparing probabilities of error predicted 
by the proposed algorithm as & function of dimensionality as 
compared to experimental observations are shown for aircraft 
and Landsat data. Resvjlts are obtained for both real and 
simulated data. Finally, two binary tree examples which use 
the algorithm are presented to illustrate the usefulness of 
the procedure. 
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CHAPTER 1 
INTRODUCTION 


1.1 Multistage Classification 

A number of different types of classifiers are now in 
routine use in remote sensing. Most of these classification 
algorithms, using pattern recognition techniques, can be 
regarded as "single-stage" classifiers, where an "unknown" 
pattern is tested against all classes using one feature sub- 
set, and then the pattern is assigned to one of the present 
classes in a single-stage decision procedure. An example of 
such a procedure is shown in Figure 1.1. 

In recent years, as classification of multispectral 
data has found a larger number of users and a wider range of 
applications, the need has been felt for alternate, more 
powerful techniques than the conventional classifiers, 
through the use of which more information could be extracted 
more accurately and/or efficiently from the scene. Some of 
the reasons that have warranted this need include: 

1 . The need to extract more detailed information from 


data 


The opportunity to do so results from the 
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emergence or more complex data seta. The growing 
use of multitype data bases containing Landsat data 
with a variety of other quantitative geodata 
together with the anticipated launching of more 
sophisticated sensors such as the Thematic Mapper 
result in the opportunity to extract considerably 
more information from the data. 

2. The broadening of the range of applications. As 
pattern recognition methods have developed, they 
have found a larger number of users with a wider 
range of applications. The feedback from these 
different and versatile uses has indicated problems 
and needs not initially present. 

3. The ever present need for improved classification 
accuracy. There are some applications for which 
conventional classifiers have proved to be marginal 
at best. Some of these are listed in Swain et al. 
(1) and include multl->image analysis and the use of 
mixed feature types. 

4. The need for improved processing efficiency. The 
conventional, single-stage, classifiers use only 
one particular feature subset and are somewhat 
inefficient, as they must compare an unknown pat- 
tern against all possible classes before assigning 
that pattern to a particular class. 
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Because of these and sther factors, there has been so«e 
research in recent years directed towards developing aultis- 
tage classifiers, whereby the decision procedures go through 
several stages before finally assigning a pattern to a 
class. An example of such a procedure la shown in Figure 
1 . 2 . 


The purpose of this research is to develop a layered 
decision algorithm that can increase the accuracy and effi- 
ciency over the conventional single-stage classification 
approach. Developing such an algorithm requires, among 
other things, a careful look at some parameters that are 
crucial to any successful attempt at tackling such a complex 
problem. In particular, three areas have to be investi- 
gated: 

1. The development of an adequate training procedure 
to define an Initial set of spectral classes with 
their respective statistics; 

2. The investigation of various error estimators and 
the development cf an adequate performance estima- 
tor that can reasonably predict the accuracy or any 
trends in performance; 

3. The development of an algorithm to build a binary 
tree making use of the above-mentioned methods. 
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Figure 1.2 An Example of a "Multi-Stage" Algorithm 
In Classifying Mulcispectral Data. 


f- 
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Of ^hesc three areas, the most important ;:^robleD is 
believed to be the development of an accurate error estima- 
tor, especially in the presence of what has come to be known 
as the Hughes phenomenon (elaborated upon later in the 
review of literature). Predicting the conditions under 
which the Hughes phenomenon occurs provides the key to the 
solution of the problem. Therefore, a considerable portion 
of the research has been directed towards trying to under- 
stand and predict the impact of this phenomenon. 


1.2 Review of Literature 
1.2.1 Training Procedure 

Several training methods have been suggested in the 
literature. We will not attempt to list all of them, but 
rather will give a background of some of the methods 
reviewed and used in this work. 

The training process is the procedure whereby labeled 
samples are selected and used to compute class statistics 
which in turn are used to classify unlabeled (i.e., "unk- 
nown" ) samples . 

Several parameter estimation methods (training methods) 
have appeared in the literature. Sample-partitioning meth- 
ods, the leaving-one-out method, clustering are but a few. 
See, for example, Fukunaga (2) and Duda and Hart (3). 
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For remote sensing purposes, clustering has been widely 
used in developing training statistics. Two basic 
approaches have been: a supervised clustering approach, in 
which the analyst selects areas of known cover types , each 
set of areas belonging to one cover type is clustered sepa- 
rately, and then the statistics for these areas are then 
obtained with the aid of a computer; and the non-supervlsed 
clustering approach, in which the entire training area is 
subdivided into clusters by the clustering algorithm and 
each cluster is then identified by the analyst and given a 
specific label. The statistics of each cluster correspond- 
ing to a cover type or a subclass of a cover type are then 
calculated. Fleming et al, (4,5) investigated several clus- 
tering approaches and their effect on classification accu- 
racy. Among the approaches they used were non-supervised 
clustering, supervised clustering, modified clustering, 
mono- (aggregate) cluster blocks, and multi- (class-condi- 
tional) cluster blocks. 


1.2.2 Performance Estimators 

A key factor in the design of a layered decision algor- 
ithm is the ability to predict how the algorithm will per- 
Tor7\ in terms of accuracy at every node. While optimizing 
the performance at every node does not necessarily produce a 
globally optimal tree, it is still a very Important and use- 
ful step in the design. 
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Several performance (or error) estimators have appeared 
in the literature. Again, we will not attempt here to 
exhaustively list all the contributions made, but rather 
will give an idea of how the research in this area has pro- 
gressed. 

Performance estimators can be divided into two main 
categories ; 

Performance functions which have some sort of direct 
relationship with the probability of error . Examples are 
Parzen estimators (see (2)), the k-nearest neighbor error 
estimator (see (6)). More recently, Mobasseri et al. (7) 
published an error estimator that computes the minimum prob- 
ability of error through use of a combined analytical and 
numerical integration over a sequence of simplifying trans- 
formations of the feature space. The results have been 
shown to be similar to those obtained by conventional tech- 
niques. However, the algorithm becomes computationally too 
inefficient to use as the number of classes and/or features 
increases. Moore, Whitsitt and Landgrebe (8) (see also 
Whitsitt and Landgrebe (9)) developed a stratified posterior 
estimator which, like Mobasseri’s, depends only on a given 
set of statistics. This was later used by Wiersma (10) and 
both estimators (Mobasseri’s and Whitsitt’s) were compared 
in (11) and found to give similar results, with Whitsitt’s 
algorithm being faster in some cases. The former procedure 
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uses a '’deterministic'* grid to sample the feature space, 
while the latter uses an internally generated random data 
base and assigns the feature vector to the appropriate class 
via the maximum a posteriori principle. Both procedures 
assume normal class conditional statistics. 

Separability measures , roost of which have only a sub - 
tle , indirect , and often unknown , relationship to the proba - 
bility of error . Various separability measures have been in 
common use in remote sensing applications. Among these are: 
Divergence (^2), Transformed Divergence (13), Jeffreys-Ma- 
tusita distance (1*1,15), Bhattacharyya distance (16) and the 
Mahalanobis distance (17). (See list in (24).) 

Several works have been reported comparing different 
separability measures and their effects on performance. (See 
(9,13,18,19,62).) 

There are two problems with most of the above separa- 
bility measures applied to remote sensing applications: (1) 
ambiguity and (2) linearity in pairwise error. The term 
ambiguity implies here that there dr s not exist a one-to- 
one relationship between the value of the measure and the 
probability of error. Linearity means that equal incremen- 
tal changes in the measure imply equal changes in the proba- 
bility of error, over the whole range. Whitsitt (9) devel- 
oped a distance measure = ©bf where B is the 
Bhattacharyya distance and erf(*) is the gaussian error 
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function. He found that the resulting measure is less ambi- 
guous and more linear than the measure B. 

Another key factor in the process of error estimation 
is the choice of feature subsets. The problems here are 
twofold : 

1. As the number of features becomes large, it becomes 
desirable to choose a subset of these features that 
can adequately predict the accuracy. This selec- 
tion process also can become expensive If one must 
search through all possible combinations of the 
feature set. It is desirable, thereforej to have a 
priori knowledge of the importance ci each feature 
in relation to the probability of error. The 
Karhunen-Loeve expansion (attributed to Karhunen 
(20), and Loeve (21)) in pattern recognition liter- 
ature has historically been used as a feature 
selection technique. It has the advantage of pro- 
ducing uncorrelated features (in theory, but the 
features are actually approximately uncorrelated in 
a practical K-L transformation). In addition, it 
imposes an ordering on the features in terms of 
importance in a representation error sense. As a 
result, first feature is "likely" to be more impor- 
tant than the second in calculating the probability 
of error, and so on. More recently, Oja and 



Karhunen (22,23) published two papers on the con- 
struction of K-L expansions for pattern recognition 
purposes that do not require the computation of any 
covariance matrices. 

The probability of error is not necessarily mono- 
tonically decreasing as the number of features 
increases. This is due to a peculiar phenomenon 
that has come to be known as the Hughes phenome- 
non. Hughes (25) found that with a fixed and 
finite training pattern sample, recognition accu- 
racy can first increase as the number of measure- 
ments on a pattern increases, but decay with mea- 
surement complexity higher than some optimum value. 
He also reported that for unlimited training data, 
this does not occur and the recognition accuracy 
reaches an optimum only at infinite measurement 
dimensionality. According to Hughes, if insuffi- 
cient sample data are available to estimate the 
pattern probabilities accurately, then a Bayes 
recognizer is not necessarily optimal. Many papers 
have since been published on this phenomenon, con- 
firming it or trying to explain why it occurs (see 
(26-42)). Thus, it appears that a successful 
design should predict when and if such phenomena 


occur 
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1.2.3 Multistage Classiriers 

In recent years, some work has appeared in the litera- 
ture aimed at developing multistage classification algor- 
ithms. There is much yet to be learned about such algor- 
ithms, and no work has been reported claiming optimality (or 
even close to optimality) of results. 

In general, earlier work can be grouped into two main 
categories: 

Sequential classification methods . These can be found 
in several papers and books (see, for example, (M3-^5)). 
Basically, the method consists of observations made on fea- 
ture measurements, one at a time. After an observation is 
made, the classifier either reaches a final decision and the 
process is terminated, or it makes another observation until 
a final decision is rea hed. 

Hierarchical classification methods . These are subdi- 
vided into two categories: 

1. Hierarchical clustering methods. Examples of such 
work are found in Fukunaga (2), Dubes and Jain (46), who 
present a semi- tutorial review of the state of the art in 
cluster validit 3 ^, and Lukasova (47). In general, hierarchi- 
cal clustering is designed to generate a classification 
tree. The "root” node of the tree represents a collection 
of samples (either a training data set or the entire sample 



set) and each terminal node represents either an individual 
sample or a group of samples belonging to some class within 
the set of classes in the data set. The method attempts to 
divide the set of samples in each node into disjoint subsets 
which form new nodes. Defined as such, the method is often 
nonparametric and depends heavily on the ability of the 
algorithm to find meaningful divisions of samples that cor- 
''espond at terminal nodes with reaningful classes. 

2 . Decision trees and criterion functions. Most of 
the work done in multistage algorithms belongs to this cate- 
gory. Often, a decision tree is built using an 
optimization or criterion function that dictates the 
structure of the tree. It is this kind of approach that 
will be of greatest concern in this research. 

Hierarchical methods differ from sequential methods in 
certain important respects. While in sequential schemes any 
class can be accepted at any stage of the measurement pro- 
cess, in hierarchical schemes certain classes are excluded 
from consideration at each stage. Also, sequential methods 
impose a linear ordering on the features. In hierarchical 
methods, features used along one decision path can be diffe- 
rent from those used along another path. 

In 1971, Nadler (^J 8 ) tried to calculate error rates in 
a hierarchical decision structure under assumptions of sta- 
tistical independence among the members of the hierarchy. 
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Even under suol' assumptions! the results assume "small" 
probabilities of errors at any level. 

Several heuristic methods of constructing tree designs 
have been proposed in the literature. Some studies were 
done using optimization methods to automate the classifier 
design procedure, but the assumptions made were often too 
restrictive. Meisel and Michalopoulos ( 49 ) in 1973 pre- 
sented a two-stage partitioning algorithm for the design of 
an optimal binary tree. In the first stage, a suboptimal 
sufficient partition is obtained. The second stage optim- 
izes the result of the first stage through a dynamic pro- 
gramming approach. The method allows only for linear dis- 
criminant functions to partition the space, certainly a 
suboptimal and too restrictive condition. 

T.n 1974, Wu et al . (50) reported on a decision tree 
approach with direct application to multispectral data ana- 
lysis. Several design procedures were proposed (one of 
which is manual), with special emphasis on a heuristic, 
machine-implemented approach. The optimality criterion used 
is a weighted sum of computation cost and accuracy. Results 
were presented which showed superiority in efficiency (but 
infrequently in accuracy) over the conventional classifier. 
The criterion function used, as it cannot predict beforehand 
the structure of the tree below that node, assumes all the 
nodes below the node under consideration are terminal nodes. 
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and hence Is necessarily suboptimal. Oater papers have 
appeared that have pointed to applications using this parti- 
cular classifier (51|52). 

In 1976, You and Fu (53) presented a linear binary tree 
classifier that uses linear discriminant functions at deci- 
sion stages with an application to multispectral remotely 
sensed data. The procedure Includes a grouping algorithm, a 
separability measure, and an error minimization procedure 
using the Fleteher-Powell algorithm (5*J). Again, the proce- 
dure is certainly suboptimal because of the assumption of 
linearity. Results reported, though, show that this classi- 
fier is much faster and more accurate than the maximum like- 
lihood classifier with the same number of features. This is 
due to the fact that the procedure uses different feature 
subsets (with a restriction on their number} at each node, 
compared with only one feature subset used in the one-stage 
maximum likelihood classifier. 

Kulkarni and Kanal (55) used dynamic programming and 
branch-and-bound methodologies in the design of hierarchical 
classifiers. The criterion of optimality they used is a 
weighted sum of the probability of error and the average 
measurement cost incurred in classifying a random sample. 
The design assumes that the features used at the nodes are 
statistically independent and that the decision at each node 
is a function of only that particular feature observation. 
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the design using only one best feature at each tree node. 
Further, the design of the optimal tree assumes a very low 
error rate for the tree, a very restrictive assumption since 
in many cases a high error rate is specifically the reason 
why a layered classifier was selected, i.e., to improve the 
accuracy. Although the authors presented some methods to 
reduce the complexity of their design algorithms, the exam- 
ples they used involve only a small number of classes and 
features . 

In 1977, Parkih ( 56 ) compared several classification 
techniques of clouds, including hierarchical design. How- 
-'ver, his paper offers no new insights or major results that 
would help improve the state of the art. 

Also in 1977, Sethi and Chatterjee ( 57 ) developed an 
algorithm for the design of an efficient decision tree with 
application to pattern recognition problems involving dis- 
crete variables. A criterion function was defined to esti- 
mate the minimum expected cost of a tree in terms of the 
weights of its terminal nodes and costs of the measurements, 
which then was used to establish the search procedure for 
the efficient decision tree. The concept of prime events 
was used to obtain the number of nodes and the corresponding 
weights in the design sample. No optimality claim was made, 
but the procedure was found to lead to the optimal tree in 
most of the cases. The procedure uses only one feature at 



every node, and Its applicability to remotely sensed aultis> 
pectral data is very doubtful. 

In 1978 , Breiman (58) presented a procedure for build- 
ing a binary classification tree. He used a criterion func- 
tion that is only a function of the parent node and the two 
descendent nodes. He used one best feature at every node. 
He also reported on another regression algorithm developed 
at Survey Research Center, University of Michigan (59), in 
whi'ih the criterion function tries to i educe the variances 
of the two descendent nodes as much as possible from the 
variance of the parent node. 

Rounds (60) in 1979 developed a binary decision tree 
algorithm, but again one feature is selected at every node. 
The approach is a nonparametr ic one, based on the Kolmogo- 
r \* Smirnov criterion. 

Dattatreya and Sarma ( 61 ) in 1981 presented a multis- 
tage binary tree ’’minimum-cost” classifier, when general 
cost functions are associated with '.he tasks of feature mea- 
surements. The optimi'.5ation of the binary tree is carried 
out using dynamic programming. However, one feature is only 
selected at every node. 

In summary, mo.st of the work done with multistage clas- 
sifier!. often imposed too restrictive assumptions or condi- 
tions, such as using one feature only at each node, or hav- 



18 


iDB a linear discriminant function. Moreover, very few 
results have been reported on situations where the Hughes 
phenomenon occurs, namely, working with a limited set of 
training samples. 

The major contributions of this research arc them’ 

1. The development of some theroretical results that 
clearly show the dependence of the accuracy of the 
estimated statistics of the classes under considera- 
tion on the number of training samples used to esti- 
mate the statistics of those classes, as well as on 
th- number of features used. 

2. The development of an error estimator which Is par- 
ticularly useful when the number of training samples 
is limited, and which ij suited for a binary tree 
classification procedure. This estimator, which 
allows the selection of a "near optimal" feature sub- 
set at every node, has no restrictions on the number 
of features that can be used at any node. 

3. The incorporation of the above error estimator in a 
binary tree procedure, shewing the usefulness of such 
a procedure in predicting the optimal features that 
lead to the best accuracy that can be attained given 
a fixed set of training samples. 
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1.3 Summary of Contents 

In chapter 2, some parameter considerations for a mul- 
tistage binary tree classifier are addressed in detail. The 
Hughes phenomenon is elaborated upon, and a technique known 
as " sumul taneous diagonalization" is introduced. Feature 
selection techniques are also treated. A data simulation 
algorithm that is repeatedly used in the research is also 
treated . 

In chapter 3, an approximation algorithm to the proba- 
bility of error is proposed that takes into account the 
Hughes phenomenon. 

Chapter 4 presents experimental results on real and 
simulated data. 

Finally, chapter 5 summarizes conclusions about the 
study. Some analytical details, together with computer 
listings and training data are placed in appendices. 
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CHAPTER 2 

PARAMETER CONSIDERATIONS 
FOR 

A MULTISTAGE BINARY TREE CLASSIFIER 


2.1 The Hughes Phenomenon 

One of the major needs for a decision tree classifier 
originates from a dimensionality problem often referred to 
as the Hughes Phenomenon (25). A considerable portion of 
this research is directed towards understanding the Hughes 
phenomenon. Figure 2.1 illustrates the phenomenon concep- 
tually. In the presence of a limited training sample size, 
the mean recognition accuracy as a function of the measure- 
ment complexity (number of feature*^ for our purposes) exhi- 
bits a peaking effect. Contrary to intuition, the mean 
accuracy does not always increase with additional measure- 
ments. Further, peaking of the curve shifts up and to the 
right as the number cf samples increases, disappearing in 
the case of an infinite number of training samples (complete 
knowledge of the underlying distributions). 

Figure 2.2 suggests a concept for one possible explana- 
tion of this phenomenon. Figure 2.2a shows a hypothetical 



Mean R ecognition Accuracy 
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Figure 2.1 The Hughes Phenomenon. 
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graph of class separability plotted vs. dimensionality. As 
dimensionality Increases, so does class separability (a non- 
decreasing function of dimensionality) until it saturates, 
and any further increase in dimensionality does not have a 
significant effect on class separability. But this is not 
the only effect on the mean accuracy. With the presence of 
a fixed, limited training sample size, any increase in 
dimensionality necessarily results on the average in a deg- 
radation in the accuracy of statistics estimation of the 
class distributions. Thus, conceptually, one should expect 
a curve similar to that of Figure 2.2b.. Further, as the 
number of samples increases, the curve should shift to the 
right, i.e., for any given dimensionality, the larger sample 
size should provide a better estimate of the true distribu- 
tions. Assuming these two effects are the dominant effects 
on accuracy, adding the two effects results in Figure 2.2c, 
a curve similar to Figure 2,1. Based upon this concept of 
the phenomenon, the solution to the problem lies in being 
able to predict quantitatively how the number of samples 
present affects the accuracy of the estimated statistics . 
Especially in remote sensing applications of pattern recog- 
nition methods, training samples are limited as ground truth 
is often not present or difficult to get. Thus, the impor- 
tance of the Hughes phenomenon becomes evident, as well as 
the validity of this conceptual explanation of it. 
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The Hughes phenomenon was studied by many researchers. 
(See (26-42)). Hughes (25), who was one of the earliest to 
introduce it and treat it in some detail, tried to explain 
it from a nonparametric point of view. The explanation 
given by Wacker and Landgrebe (62) is of another nonparame- 
tric case, where the Eu 2 lidean distance measure is used for 
discrimination among classes. 

Several researchers (28-34) tried to study the effect 
of limited training sample size and independence of measure- 
ments on the recognition accuracy. 

In 1979, Trunk (38) provided a simple example in which 
he showed theoretically that the probability of error 
approaches zero as the dimensionality increases and all the 
parameters are known in a two-class problem, but it 
■'.pproaches one-half as the dimensionality increases and the 
parameters are estimated. 

In remote sensing applications, where maximum likeli- 
hood classifiers are frequently used, and where the assump- 
tion of class-conditional multivariate normally distributed 
data is invoked, not much work concerning the dimensionality 
.problem has been reported yet. Wacker and El-Sheikh (40-42) 
presented some papers dealing with dimensionality problems 
for two-class Gaussian problems. Their results again show a 
Hughes phenomenon occuring with finite training data. 
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It then follows that any error estimator in a multis> 
tage classification algorithm that can claim some optimality 
in results from an accuracy point of view, should bt; able to 
predict when/if a peaking occurs in the curve mentioned ear- 
lier. It is this key problem that this research is attempt- 
ing to solve, l.e. the development of an error estimator 
that can accurately predict the Hughes phenomenon. 

Working with multlspectral data, one almost always has 
to work with multiple feature measurements and multiple 
classes. In this research, we propose a binary tree multis- 
tage classifier. This means that any node in the tree is 
either a terminal node or is further subdivided into two 
nodes (with statistics corresponding to two classes). 

The advantages of a binary tree procedure are the fol- 
lowing : 

1. Working with two classes allows a theoretical 
understanding of the problem. Many pattern recog- 
nition results that apply to two -class problems 
fail to do so in multi-class ones. This is parti- 
cularly true in the ” simultaneous diagonalization" 
technique that will be introduced snortly. 

2, Most feature selection algorithms used in pattern 
recognition applications generally, and in remote 
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sensing applications specifically, are optimal only 
when applied to two-class problems. For multi- 
claas problems, a separability criterion is aver- 
aged over pairs of classes and thus is optimal only 
in an average sense. Working with a binary tree, 
then, should provide us with both convenience and 
accuracy . 


Working with multiple features, several properties are 
desires in these features which will make further analysis 
easier: 

Uncoupled ( Independent ) Features . Uncoupling of fea- 
tures from one another simplifies analysis a great deal as 
it permits evaluating the effect of each feature separately 
from other features. 

0 r’ d e r e d Features . If the features can be ordered, or 
at least approximately so, in terms of their effect on the 
probability of error, then the process of feature selection 
would be made easier. 

Optimal Separability . The features should be optimal 
with respect to the probability of error for two distribu- 
tions at hand. Putting it in different words, the feature 
subset should be tailored to the separability of the two 


distributions . 
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To this end, a technique known as a "simultaneous 
diagonal ization" (63, 6A) is discussed in the next section. 


2.2 Simultaneous Diagonalizetion: Theory 

A A 

Let Ij^and ^2 estimated covariance matrices for 

classes 1 and 2, respectively. We seek a transformation 
matrix A such that 

A = I k ^ ^ (2. n 

where I is the identity matrix and A is a diagonal matrix. 

This transforms^: ion would uncouple the features, while 
not affecting the probability of error because the latter is 
invi riant under linear transformations. We proceed to find 
such a transformation as follows. (?or more details, see 
(2), pp, 31-35.) 

Let 0 and be the eigenvalue and eigenvector matrices 
of , respectively; then 


-V 

A 

-h 

$0^-1 

“ 0) 

(2.2) 


CM 

< 

4> 0 ^ = K 

K is a general matrix 

(2.3) 
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Next, we desire to diagon&lize K, To find eigei. iral ues 
of K . it is necessai-y to solve the equation 

|k - Al| = 0 (2.4) 

Replacing K and I in (2.4) by (2.2) and (2.3), we get 

4> Ej « | « O (2.5) 

Or 

le-%^11 - x^JI » o-’^l -0 (2-6) 

_ W T 

Since o is nonsingular, it follows that 



= 0 


(2.7) 


or, 




= 0 


( 2 . 8 ) 


So, only the eigenvalue and eigenvector matrices of 5^2 
need be calculated. 
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The eigenvalue matrix is then A , and che transpose of 
the eigenvector matriXt A , serves as the transformation 
matrix. 

The idea behind simultaneous diagonalization is to 
transform the original features into a new space where the 
features are independent and then choose a subset of these 
features in the new space which is optimal with respect to 
the probability of error. This is illustrated in Figure 
2.3. 

2.3 Feature Selection 

Before proceeding to discuss the approximation algor- 
ithms to estimate the probability of error, we digress 
briefly to discuss how the features are ordered. 

The literature offerc many studies made on comparing 
different separability measures and their effectiveness in 
choosing the best feature subset (see (9,13,18,62,65)). It 
appears that the Bhattacharyya distance is one of the most 
suitable separability measures for distinguishing between 
classes. Thus, it will be used as a basis for feature 
selection. The fact that the features are independent 
allows us to determine the effect of each feature on the 
probability of error separately. 
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The Bhattacharyya distance for two normal distributions 
can be expressed as follows: 


B - \ (^,-^2)'' 




1 


w 


)l 


( 2 . 9 ) 


After the simultaneous diagonal ization transformation, 
however, B can be expressed as: 


B 


P 

I 

i-1 






( 2 . 10 ) 


where d is the jth element of the transformed class-condi- 

rp A 

tional mean: D = A and is the ith diagonal element 

of A . 

Thus, it is clear that for every feature i, B can be 
calculated separately. The feature with the largest B is 
the beat feature, the one with the second largest is the 
second best, and so on. Also, the two best are the best 
two, and so on. 
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2. a Simulation Algorithm 

Need For A Simulation Algorithm 

For reniote sensing data analysis, several assumptions 
are commonly made. These assumptions are usually that the 
data are class-conditionally distributed multivariate normal 
and that the data used to train the classifier are represen- 
tative of the area of interest. This second assumption 
actually has several parts. The assumption is made that in 
the process of training, all classes present in the scene 
are found, and all spectral subclasses of each class are 
also represented in the training data. Furthermore, the 
parameters of the distribution of each subclass are also 
assumed to be known from the training data. Each pixel is 
assumed to come from one of the training classes, and also 
is assumed to be entirely of one cover type. 

In actual practice, these assumptions are not met. The 
number of spectral classes in the area is not known and 
clustering or some other method is used to determine the 
number of subclasses, in addition to estimating the statis- 
tics of those subclasses. Some of these methods also lead 
to non-normal subclasses. In particular, the clustering 
algorithm available through LARSYS truncates the tails of 
the subclass distributions and so leads to non-normal dis- 


tributions 
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There are also questions relating to a single picture 
element. A single pixel in Landsat data covers an area 
approximately 60 meters by 50 meters. More than one cover 
type may be present in this area and result in a "mixture 
pixel" observation. It is not clear how the distribution of 
the spectral response of mixture pixels can be related to 
the distribution of the spectral response of "pure pixels". 

There has been much speculation in the remote sensing 
community as to the effect of the non-satisfaction of the 
basic assumptions. Whenever new algorithms are brought 
forth, the old question is raised again, indicating that 
there is insufficient understanding of the interaction of 
the real attributes of the data and the theory of the algor- 
ithms. At times it is not clear whether a particular 
result is due to aspects of the algorithm or to the extent 
the data set deviates from the assumptions. 

In testing new algorithms, deviations from the assump- 
tions may obscure the action of the new process. One way to 
clarify the situation is to apply the algorithm first to a 
data set satisfying the assumptions. 

Such a data set could be obtained artificially, through 
simulation. The analyst could then know: how many classes 
exist in the data; the true distributions of the classes, 
including normality if desired; the observations could 
really be independent; and no pixel would be a "mixture 
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pixel". New algorithms could be studied on such a data set 
with the knowledge that any "strange" effects are indeed 
algorithm rather than data problems. 

In many cases where simulated data have been used in 
the past, the data were too artificial, in the sense that 
all aspects of the image were controlled, removing the 
natural variation in object size, position, and relationship 
which occur in real data. This limited the use of the simu- 
lated data sets in testing new algorithms. 

The natural spatial information occuring in multispec- 
tral data could be retained in a simulated image by spa- 
tially basing the simulation on a classification. It would 
be even better to base the simulated data on a digitized 
"ground truth" map if the spectral characteristics of the 
cover types were known. By basing the simulation on a clas- 
sification, the number of classes, their exact distribu- 
tions, and the class of each pixel in the area are known. 
If the classification was sufficiently accurate, then the 
spatial information held in the classification map will be 
close to the actual cover type map and actual spatial con- 
tent of the original data. For each pixel in the area, a 
random vector distributed according to the pixel’s class 
statistics could be genera’-ed. This becomes the simulated 


data vector. 
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This simulated method was reported in LARS Technical 
Report 070980 (66), and the program will be used for testing 
the error estimator developed. 


2.4.2 Statistical Background 

From the classification chosen as a basis for tbs simu- 
lation, the following are known: the number of classes K, 

the set of classes (uj^ , i = 1,...K ), the class distributions 
(f(i*>^),i*1,...K), their means and covariances ( and^^^^ , 

i=1,...K ), the number of channels p, and the class of every 
pixel in the scene. 

From classical statistics: 

(1) Let X:px1, A:pxp, and b:px1. 

If X%N (0,lp), then Y = AX + b <v N (b, AI^A^ = AaS 

(where I is the identity matrix having dimensionality 
P) . 

(2) Let ): be a symmetric, positive definite matrix. Then 

there exists A, such that 
T 

AA = 2 ; (A is denoted Z 

To simulate a pixel which was a member of class i in 
the base classification, Ni ( 0 , 1 ^ ) (the random vector for each 
pixel is independent of other vectors) is generated. (See 
Appendix A.) Next Y = E^X + is calculated; it is then a 
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random vector from the population N( 5:^). This process is 
repeated for each pixel of the base classification and the 
random vectors thus generated are stored appropriately, 
i.e., so as to correspond to their simulated spatial loca- 
tion. 


The program requires as an input a classification map 
stored on a results tape. The results tape has the class 
statistics for p-dimensions also stored on it. The program 
then, uses the results map and the stored statistics to gen- 
erate a p-dimensional data set, which is stored on a user 
specified output tape in LARSYS format. 

Appendix A provides a mathematical derivation related 
to the generation of normally distributed samples. Appendix 
E provides a Fortran program listing for the simulation pro- 
gram . 


With all the preliminaries discussed, we are now ready 


to begin our discussion of the error estimator algorithm 
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CHAPTER 3 

PERFORMANCE ESTIMATOR: 
APPROXIMATION TO THE PROBABILITY OF ERROR 


3.1 The Likelihood Function 

As mentioned earlier, our goal is to develop a perfor- 
mance estimator that can predict where the peak in the 
Hughes curve occurs. Some of the most serious difficulties 
facing researchers in trying to estimate the probability of 
error in multidimensional analysis are: 

1 . The need to carry out a multiple integration on 
the multivariate probability density function. Most 
often, this integration is almost Impossible to 
carry out analytically, and numerical integration 
that is often costly has to be perfomed. 

2. The measurement features are often correlated, 
making it difficult to assess the importance of each 
feature separately on the probability of error. 

3. In most of the cases, one has to deal with multi- 
class problems (greater than 2) which further com- 
plicates multivariate probability density functions. 
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It would be much easier, therefore, if one could work 
with a function that is one-dimensional but carries all the 
information present. Fortunately, since we are looking at 
two classes at a time in a binary tree procedure, such a 
function does exist, and is called the likelihood function 
(minus the log of the likelihood ratio). See, for example, 
( 66 ). 


The likelihood function, denoted h(X), is given by: 

h(X) = -In p(X/w^) / p(X/w 2 > (3.1) 

where 

p(XZWj^) is the probability density function uf 
X given w^ . 


In remote sensing applications, the assumption of mul- 
tivariate class-conditional normal distributions is almost 
always invoked, and will be consistently used in this work. 


Using this assumption, p(X/w^) becomes: 


p(X/w^) 


1 


exp (-is(X^-M^^ )Z“^ (X-M^))(3.2) 


where is the mean vector of class i. 

is the covariance matrix of class i. 
p is the number of dimensions. 
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In practice, and i: ^ are estimated from training 

&>tatistics and are replaced by and E 


The Bayes decision rule for minimum error may be writ- 
ten as follows: 


P(Wj/X) < X E 


(3.3) 


The a posteriori probabilities P(w^/X) may be calcu- 
lated from the a priori probabilities P(w^) and the condi- 
tional density functions p(X/w^) using Bayes theorem, i.e. 


P (w^/X) = p(X/w^) P(w^) / p(X) 


(3.4) 


Since p(X) is common to both sides of the inequality 
of (3»3)| the decision rule can be expressed as: 


p(X/wj) ?(«!> < p(X/w2) X e 


1 


(3.5) 


w . 


i(X) 


p(X/w^) 
p(X/w ) 


P(W2> 

P(Wj) 


■+ X e 


(3.6) 


h(X) can then be written as: 
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h(X) *= -in (HX)) ^ »s(>' O’-Mj) ^ 2 ^ (V-M 2 ) 


-t‘5ln 


> In 

^ X c 


^2 

P(w2> 



In practice, since and are replaced by and 

h(X) becomes (after moving In P(Wj^)/P(w 2 ) to the L.H.8.) 

h(X) - SjCX-Mj)'^ (X-Mj) - H(X-M2>^ ^2^ (X-M 2 ) 


+ ^ In 

ii 

- In 

>0 

X c 


^2 

P(W2) 

< 

< 


(3.8) 


The Bayes test for minimum error reduces then to look- 
ing at the value of h(X), assigning measurements with posi- 
tive values to class 2, and measurements with negative 
values to class 1. 


Note that h(X) is a one-dimensional random variable. 
Th« problem then is to know, or estimate, the probability 
density function of h(X). Once that is known, the proba- 
bility of error can be obtained by carrying ovit a scalar 
integration. Figure 3 • 1 shows the probability density 
functions for h(X) given either class 1 or 2. 


The probability of error can be calculated as: 
c = p(error) = p{error/Wj )P(wj ) + p ( error/ W2 ) P ( W2 ) 


(3.9) 


Discriminant function: 
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Let the domain or decision space of X be divided into 
regions and T Then, if a sample belongs to Wj, an 
error occurs whenever Xcr 2 » Similarly, if a sample belongs 
to w an error occurs whenever XeT,. Thus, 

c »r (Xcr2/wj) p(wj) + P(xcrj/w2> pCwj) O.io) 

In terms of the probability density functions of 

A 

h(X/w^), this becomes: 

c - i>(w,) yp(h/„, dh + p(„j) /"pch/wj) dh 
“ 0 

(3.11) 

“ ® 1 

* + e 

2 

The probability of error is then the area under the 
two curves in Figure 3.1 multiplied by the prior probabili- 
ties. The objective is to develop an algorithm which will 
approximate the cl ass- cond i t i ona 1 probability of h(X), and 
hence, the probability of error. 


3.2 Performance Estimator 

Fukunaga and Krile (6iJ) developed an algorithm that 
approximates h(X). This algorithm assumes there are two- 
class multivariate normal distributions, and was tested 
using one eight-dimensional simulated data set. 
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The algorithniv howevi.tr, assutKes thb training aanples 
are enough to reasonably estimate the true statistics of 
the distributions, and hence does not take into account the 
Hughes phenomenon. Put in other words, in situations where 
the training samples are few and do not reflect the true 
statistics of the distributions, the algorithm will treat 
the statistics obtained from the training samples as a 
"perfect" estimation of some "wrong" distributions, when in 
fact they are an "imperfect" estimation of the true statis- 
tics . 


It is this algorithm, proposed by Pukunaga and Krile, 
that we will use and modify to take into account the Hughes 
phenomenon. Therefore, it seems appropriate to explain the 
algorithm in detail, and then discuss the modifications 
made to it. 

3.2.1 The Normal Assumption 

Looking at equation (3.8), since h(X) is a quadratic 
function in general of a normal random variable X, it can- 
not itself in general be normally distributed. However, in 

A 

the case where 1^5 2:2, h(X) becomes a linear function of X 
and hence is normally distributed. 

In most cases, however, Ej ^ ^2 • Pukunaga and Krile 

A 

still tried to assume that h(X) is normally distributed. 
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An algorithm was developed and tested in this research 

A 

under the assumption that h(X) is normally distributed 
^although ^2:^ ) but results showed it to be a very poor 

approximation of the probability of error and hence it was 
not further analyzed. 


3 . 2.2 The Modified Gamma Distribution Assumption: 

Fukunaga and Krile Version 

Consider h(X) as given by equation (3.8). Applying 
the simultaneous diagonallzation technique described ear- 

A * 

Her, Ej is transformed to the identity matrix I, and 1 2 

transformed to a diagonal matrix A . The transformation 

T 

matrix is denoted A , or the transpose of the eigenvector 
matrix A. 

Without losing generality, we assign the origin of the 
coordinate system such that: 

= 0 and m 2 = Mj - M 2 (3.12) 

With XeWj , h(X) can be written as another function of 
Y, where YsA^X, as follows: 

h(Y/Wj) « Y -(Y-D)^ a"^ (Y-D) + In ^ ^ 

1^2 I 

P(wj) (3.13) 

- 2 In 

P(w ) 

A IJt A Z- 

where D s A m 2 . 
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Since the features are now uncoupled, this can be 
written as: 


h(Y/wj) 


P 

£ 

i-1 



1 * 2 
^i 


2 


P 1 

I (d- r ) (y 
i*”! 

P(w ) 

- 2 In 

P(Wj) 




In 


P(Wj) 


PCwj) 


(3.U) 


+ In X^)) 


where p is the number of dimensions. 

A * 

d ^ is the ith element of vector D. 


Now, we have h(YZWj^) in terms of p independent Gaus- 
sian random variables y^, each of which has zero mean and 
unit variance with respect to class w^. 

Defining a new transformed variable Z and a trans- 

A 

formed difference- of-means vector v as follows: 


a") 


(3.15) 

d-’^ Ah 

m2 ■ A ^ D 

(3.16) 


46 


% 

hCX/w^) can be expreaaad aa a function of the new 

A 

variable Z and v by aubatituting (3.15) and (3.16) into 
(3*8) aa followa: 


h(Z/w2> 


(Z+v)^ A (Z+v) - z’^Z + In 


liil 

■ * d 

P(w.) 
- 2 In — 

1 

P(W2> 


(3.17) 


Again, since the features are ncoupled, we can write 


A 

h(Z/w^) as follows: 

h(Z/w 2 > - I (A^(z^+Vj^)^ - - In X^) 

i* 1 


P(w.) 
2 In ^ 
P(W2> 


P * 2 

- Z ((X.-l) (z.+ ^ + In X )) 

1-1 ^ ^ X ^-1 X ^-1 

P(w. ) 

- 2 in (3-18) 

P(W2> 


Again, we have an expression in terms of p independent 
Gaussian variables z^, each of which has zero mean and unit 
variance . 


Next, we define the following quantities for conveni- 
ence: 

®li“ ^ ' ^^^i (3.19) 

bii* d^/(Xj-l) (3.20) 

“2i‘ ^ - 


1 


(3.21) 



(3.22) 


4 7 


b - A*S d 
**2i *1 ‘*1 i 


(In + dj/(X^-l) + 2 In P(Wj)/P(w 2 > 


(3.23) 


Substituting equations ( 3 • 1 9 ) - ( 3 • 23 ) back Into equa 
tlons (3*1^) and ( 3 * 18 ), we get: 


h(Y/wj) 

P 

« I 
i«l 


(Pi + ^ 

(3.24) 

h(Z/w2> 

P 

• E 

1« 1 

^®2i 

(Zl + ^21^^^" ^ 

(3.25) 


Referring from now on to Y and Z as ^ , and to y and 

A A ^ 

as we find that h(C/w^) and h(4/w2) have the same 

functional form, except for the values of 321* 

‘ 21 - 

T heorem 3 . 1 


If X = (x^,....,Xp) where the x^^ are a sample from a 
2 

Normal(0,a ) population, then the random variable V = 
P 2 2 2 

EX* /o has a X » o*' chi-square, distribution. 

1 » 1 ^ ^ 

Proof: 


See (67), p. 16. 
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The orem 3.? 

If are Independent random variables, then 

the density of their sum s^+S24-..,4>s^ equals the convolu- 
tion of their respective densities. 

Proof 


See (68), p. 189. 


Examining equations (3.2^) and (3.25), shows that the 
density functions of h(C/w^) and h(C/w 2 ) can be obtained by 
convolving the densities of p non-eentra?> (because of the 
bj^ and the b^^ terms) variables having multiplicative 
constants a^^ and and adding a shift parameter C. 


The density of h(f, ) is divided into three parts: 


* kr 
ki - 


for a, , > 0 
ki - 


(3.26) 


ks 


kj 


* ks 

I 

< 0 


'kJ “kJ '’kJ* “kj 


< 0 


(3.27) 


P . . , . . 

C - t (In X + d /(X -1) + 2 In P(w,)/P(w.,) 

J *: 1 111 1 2 


( 3.2 8 ) 


‘P ■ '■kr '■ks> ■ ''2) 

The density function of convolu- 

tion of p densities of squared Gaussian variables having 
kr 

multiplicative constants. All p. densities lie above the 

kr 
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positive h axis with 0. Similarly, the density func- 
tion of Vj^g, Pjjg^h), is the convolution of densities of 
squared Gaussian variables with multiplicative constants. 
All densities lie on the negative h axis with 8kj< 0. 

A gamma density function is given by: 


g - XP xP"^ e‘^'‘ /r(p) 

6p,X 


(3.29) 


Let k be a positive integer. With ps1/2k, and X =1/2, 
the gamma density g(p,x) is referred to as the chi-squared 
density with k degrees of freedom. (See (67),p.13). 


Theorem 3.3 

If X ,...., X are independent random variables with 
1 n 

gamma distributions (p ,a),...., (p *x), then Y = X -»-X 

1 n In 

has a gamma distribution (p +...+p ,x). 

1 n 

Proof 


See (67 ) . p. 15 . 


Since what we have is the summation of chi-squared 
random variables (special form of a gamma distribution), 
both Pj^^(h) and ^PRs^^^ reflected to the positive 
side) can be reasonably approximated by a general gamma 
form, especially for large and as follows: 
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g(h) “ 

< 


.a -h/b 

h e 

6°“^^ r(a + l) 


h > 0 


( 3 . 30 ) 


h < 0 


The parameters a and B can be determined so that the 

2 

mean n and the variance o of the "true " distribution 
match those of the approximation. 


Next, we calculate the expected values and of 

2 2 

V, , and V, , and the variances o, , and . 
kr ks kr Rs 


kr 


kr 


“ki"- “ 


“ki '^ki ‘■ki' 


“ki i ® 


■■kr 

I a 


a, . > 0 
ki - 


ki "ki ^ 2 "ki 'ki * > 


i^kr 

‘ ^ (1 + 0 + b^ ) 

a, > 0 

ki - 


or - 


'kr 


‘'kr 

I a 


^ki- “ 


ki <* ^ "ki> 


for Pkr^''^ 


( 3 . 31 ) 


(E. has zero mean and unit variance) 
ki 


S imi larly , 
P 


ks 


n 


ks 


®kj' 0 


^ 


for P^jOi) 


( 3 . 32 ) 
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“kj’ “kj - ° 


^i'«kl ^ ‘ «kl + ‘ »kl ‘kl 

*kii “ 


* “ *-kl «kl •>«>> " 


( The zero term comes because is Independent from 

Cj^j and hence they are mutually orthogonal as “ O 


kr 


®ki- 0 


"ki * ^'‘L* "^i’ 


( 3 . 33 ) 


where * 


1 .3 
0 


.(n-1) for n even 
for n odd 


S <\r> 


kr 


^ki^ 0 


“ki<i * "ki^ * “ 


kr 


^kl- “ 
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(1 + 2 ) for P^^g(h) (3.36) 


For a random variable h, which has a gamma distribu- 
tion with parameters a and 6* (See equation (3.30) ), then 


E(h) * (a+ 1)B Var (h) ■ (a+ 1)B^ (3.37) 

(See (67). p. A4) 


Therefore , 
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*kr* 

“ks* ^kr* ®ks» 

can be calculated as: 
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Similarly , 


ks 


kj 


*'ks 

Z 

« 0 


The density function p( h/w^ ) , i= 1 , 2 , which is our final 
goal, is then the convolution of two gamma densities with a 
constant shift: one is distributed on the positive side of 
the h-axis, and the other on the negative side. 
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However, the convolution of these two gamma 
densitlties is hard to obtain in an explicit mathematical 
expression, because in general, a is not an integer. Since 
we do not favor a numerical integration technique for cal- 
culating the error rate, a "modified " gamma distribution 
is proposed as follows: 


g' (h) 


\ r{r+i) 


for h > c 


(3.42) 

for h < c 


Y ■ 0 or 1 

In other words. Gamma density curves are roughly cate- 
gorized into two types: one is exp(-h/3), and the other is 
h exp(-h/0), depending on whether a obtained by (3.38) or 
(3*39) is larger than or smaller than a threshold value of 
0.35. (The threshold value of 0.35 is a compromise value, 
chosen in an attempt to match the maximum value and loca- 
tion of the maximum value of the gamma density to the modi- 
fied gamma approximation. It is further explained in 
( 6 * 0 ). 


as 


The procedure proposed by Fukunaga and Krile, then, is 
follows : 

* ' - 2 * 2 

1) Calculate °j^g from equations 

(3. 31), (3. 32), (3. 35), and (3.36) 
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2 ) Calculate and form equations (3*38) and 

(3.39). 

3) = 0 if “itr^O‘35, and s 1 lfOj^^^O.35. 

Similarly for Y|^g . 

4) Calculate 6^^^ , 6^^, and by the following 

equations: (modified forms of equations 


(3.38) 
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(3.43) 

(3.44) 

(3.45) 

(3.46) 


Equations ( 3 . 4 3 ) - ( 3 . 46 ) are the same as ( 3 . )- ( 3 . 4 1 ) , 
except for the shift of the mean Cj^j. or Oj^g . 

■k 

The convolution of (h) and P|^g(h), ( h ) , kr 1 , 2 , can 

be obtained as an explicit expression. The result is : 
(See (64) for details) 
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(3.47) 


Defining the distance d as : 


^k • ' “kr ■ 'ks’ 


(3.48) 


We can find Bj by integrating Pj(t) form d^ to <» , and 


by integrating P 2 (t) from -® to d^' 


The term d, brings 
k 


the shift parameter C back ino the picture, and also 
accounts for the displacement of the (h/Wj^) approximations 
by Cj^^ and In general, 
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* * 
where r (dj^) is the approximation for Prob( h/Wj^<0 ) , 

Thus, the approximated values of recognition errors are: 


Cj - P(wj) (I - D (dj)) (?.50) 

Cj - P(«2^ (D*(d2>) (3.51) 


3.2.3 Proposed, Modified Algorithm 

Figure 3.2 shows a flowchart of Fukunaga's and Krile*s 
algorithm. The algorithm assumes that the training statis- 
tics are an accurate representation of the true statistics 
of the two distributions. This being the case, the proba- 
bility of correct clas^-if ication that the algorithm pro- 
jects is monotonical ly non-decreasing as a function of 
dimensionality. It is this drawback in the algorithm that 
we are trying to correct such that the algorithm would take 
into account the number of samples used for training. 

Looking back at the calculation of the parameters of 
the modified gamma distribution, we see that all of them 

A A 

depend on two parameters, and , or the mean and vari- 
ance of h. If these parameters are inaccurate, then all of 
the other parameters will be affected. 




Figure 3.2 A Flowchart of Fukunaga and Krile's 
Algorithm . 






se 

We propose to look t the way these parameters, parti- 
*2 *2 

cularly and are distributed as a function of the 

number of training samples. We then want to incorporate 

A A A M 

2 2 

that information in our estimation of and <>21 such that 

% 

the algorithm has a more realistic picture of what the 
training samples represent. 

*2 

Estimating the probability density function of o^ and 

*2 

^2 is by no means an easy task. For the amount of informa- 
tion that we have, such an estimation is very Involved and 
impractical. A discussion of the difficulties one faces in 
attempting such an estimation is found in Appendix B. 

*2 

We propose instead to look at the variances of and 

„2 

^ 2 * then Incorporate that information in our estimation 

of these parameters. 

Let us look at (Var (h/wj^)) and (Var (h/w 2 )). 

From equation (3.35), (or (3.36)); 

0 ? - 2 I aj. (1 + 2 (3.35) 

^ i-1 

Substituting for aj^j^ and bj^^ by their values from 
(3.19) and (3.20) in (3.35), we get: 

o? - 2 E (1 - 1/X (1+2 dj/(i,-l)^) (3.52) 

^ i»l ^ ^ ^ 


After multiplying, this reduces to: 
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ij . 2 I <1 - in, + <2 + ») ) 

' 1-1 * 

In natrlx form, this can be written aa: 

*7 *-12 

Oj - 2 (tr (I - A *) + 2 (A D) 


(3.53) 


(3.54) 


Or in terma of the original diat ributlone : 
o\ • 2 (tr (I - ij)^ + 2 Ej ^ 2 ^ ttj) 0-55) 

(See (64)). 

Similarly , 

°2 ■ 2 J, <> ^ 2 "L’ 

• 2 - 1)^ (I + 2 i\ /(Xj-1)^ > 

P * 2 ' *■ 

- 2 r a; + 2 (d. -1)A, + 1) (3.56) 

t 1 


In matrix form, °2 written aa: 

- 2 (tr (A - I)^ + 2 d’^A D) 


*2 

°2 


(3.57) 
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Or, in terms of the original distributions: 


*2 

"2 


2 (tr Zj - 


I)' + 2 


A * I A 

l2 «2 > 


( 3 . 58 ) 


(See (6<n). 

*2 *2 

In order to calculate the variances of and ^21 we 
make the following assumptions: 

A A A 

1 . The original and transformed means, Mx*^ 2 * ^ 

are assumed to be constant. Experience has shown 
that one can approximate first* order statistics 
with a relatively few number of training samples. 

A A 

2 , and ^2 are Independent. This is to say that we 
will ignore any relationships that might exist 
between the two classes. 


Having assumed the above, the results are: (See 

Appendix C for the complete derivation) 
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A 2 

Note that Var( and 'Jar(o^) are inversely propor- 

tional to the number of training samples used to estimate 


the statistics of classes 1 and 2, and directly propor- 
tional to the number of dimensions. In other wordSj as the 
number of training samples increases, the variances of our 
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estimates of and decrease, as expected. Also, as the 
number of dimensions is increased, the variances of the 
estimates increase. 

Since we do not have the probability density functions 

'2 '2 

of and ^21 want to think of a reasonable way to 

incorporate the effect of the number of training samples 

'‘2 ^2 

into our estimation of and O2 • claim that a better 

2 2 

estimation of the true variances and ai consists of our 

»2 -2 

estimation of these variances, and O2 , plus some multi- 
plicative factor of the standard deviations of these esti- 

A A A 

2 2 

mates, namely the square roots of Var(oj^) and Var(op, that 
were calculated above. 

This multiplicative factor was chosen empirically. 
Experimental results in Chapter 4 show that the variance of 
the probability of error generally increases with increas- 
ing dimensionality, especially in the presence of a very 
limited training data set. Results also show that the 
probability of error is inversely proportional to the num- 
ber of training samples. Moreover, it is very sensitive to 
the number of training samples in the cases where that num- 
ber is not much greater than the number of dimensions. 

Based on the above observations, the following empiri- 
cal formula for the multiplicative factor was used: 

2 

= 2 p /(Oj. TL^) 


M.F. 


( 3 . 61 ) 
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where p Is the number oT dimensions 
nj^ and n 2 are as before. 

The new procedure to calculate probability of 

error, becomes as follows: 

4k A A A 

1) Calculate t °ks’ equations (3.31), 

(3.32), (3.35), and (3.36) 

• 2 * 2 

2) Update Oj^^and as follows: 


2 

kr 

(new) « 

(old) 

2 

4-( 2p /n j .n 2 ) . 

(Var(o^^)) 

2 

ks 

(new) . 0 ^^ 

(old) 

2 

+( 2p /n j .n2) . 



3) - 1 if > 0.35, and - 0 if < 0.35. 

Similarly for • 

4) Calculate 6^^., 6^^^, and from equations 

( (3.43) - (3.46) ). 

5) Calculate p* (t) and D*(dj^) from equations (3.47) 
and (3.49) . 

6) Calculate the probability of error from equations 
(3.59) and (3.60). 


We are ready now to proceed to Chapter 4, where sev- 
eral experimental results are shown. 
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CHAPTER 4 

EXPERIMENTAL RESULTS 


4 . 1 Introduction 

Some results on fenture selection techniques will be 
presented first. Next, several experimental results illus- 
trating the Hughes phenomenon are shown. Results comparing 
probabilities of error predicted by the proposed algorithm 
as a function of dimensionality as compared to experimental 
observations are then presented for aircraft and Landsat 
data. Results are obtained for both real and simulated 
data. Finally, twc binary tree classification procedures 
that make use of the algorithm are presented to Illustrate 
the usefulness of the procedure. 

The Bayesian decision rule with assumptions of 0-1 loss 
function, equal a priori probabilities , and multivariate 
normal distributions is used as the decision rule in all 
experiments when classification is involved. 

Detailed training and test field descriptions for all 
the experiments conducted are found in Appendix F. 
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^.2 Experiments on Feature Selection Techniques 

In this section, some experiments on different feature 
selection techniques are presented. The purpose of conduct- 
ing these experiments is to choose an effective feature 
selection technique, particularly when dealing with a small 
number of training samples. 

Experiment 

Two classes of wheat and corn are selected from multis- 
pectral scanner (hereafter referred to as MSS or aircraft) 
data of the 1971 Flightline 210 from the Corn Blight Watch 

Experiment, and classified. The data was collected on 

August 13, 1971. Part of the selected data is used for 

training and a much larger portion is used for testing. The 

number of features used for classification varies from one 
to twelve, and the number of training samples for each class 
is chosen such that it is much higher than the number of 
features (265 samples for wheat, 569 samples for corn). A 
principle components ( Kar hunen-Loe ve ) transformation is 

applied to the data, and then three feature selection tech- 
niques are compared: 

1) In the first feature selection method, the features 
are ordered according to the largest eigenvalues 
resulting from the K-L expansion. This method, 
referred to hereafter as the K-L ordering method. 
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assumes that the best feature is that which corres- 
ponds to the largest eigenvalue of the mixture covar- 
iance matrix of the whole data set, the second best 
corresponds to the second largest eigenvalue, ...etc. 
This ordering then imposes the condition that a fea- 
ture subset with lower dimensionality is always a 
subset of another with higher dimensionality. The 
method then depends on the eigenvalues of the mixture 
covariance matrix, and ignores any among^class vari- 
abilities. 

2} The second feature selection technique method is 
referred to as the Transformed Divergence method 
(13). The transformed divergence, , is defined as 
follows: 

. 2000 (1 - e"®/® ) (4,1) 

where D is the divergence of two normal distribu- 
tions, and is defined as follows (12); 

0 . i tr 

+ Y (”1-^2) -*-2:2 XMi-Ma) ( 4 . 2 ) 

For a given dimensionality, the method chooses the 
feature subset with that dimensionality which gives 
the largest value of D Unlike the K-L method, a 

feature subset of lower dimensionality is not neces- 


i 


‘i 


1 


sarily a aubset of another with higher dimeneional- 
ity. Thia method ia applied to the data after it haa 
been K-L transformed. 

3) The third feature selection technique method used 
is the Bhattacharyya distance (16), defined by equa- 
tion (2.9)« In this method, a simultaneous diagonal- 
ization technique is applied to the covariance 
matrices of the two classes (after a K-L transforma- 
tion of the data), and the best feature is then 
selected as that which corresponds to the largest 
value of B as defined by equation (2.10). The second 
largest is that which corresponds to the second larg- 
est B, and so on. As in the K-L method, a feature 
subset of lower dimensionality is always a subset of 
one with higher dimensionality. The transpose of the 
eigenvector matrix obtained is then multiplied by the 
observation vectors to transform the data, the mean 
vectors and the covariance matrices are transformed, 
and the data classified. 

Results are shown in Figure 4.1, which plots the recog- 
nition accuracy (Pcc^) as a function of dimensionality. It 
is seen that of the three methods, the transformed diver- 
gence one gives the poorest performance. The K-L method is 
better, but the best method is that obtained from the Bhat- 
tacharyya ordering, which saturates at a very low dimension- 
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Figure A . I 


Classification Results of Data in 
Experiment A.l Using Three Feature 
Selection Techniques. 
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allty. Note that as dimensional ity increases, the three 
curves start approaching each other, until they all coincide 
when all features are used (The probability of error Is 
invariant under any linear transformation). 


Experiment 4.2 

In this experiment, 20 samples each of wheat and corn 
are chosen randomly from the training samples of experiment 
4.1. The test samples are the same in both experiments. 
Again, the same three feature selection techniques elabo- 
rated upcn above are used. Classification results are shown 
in Figure 4.2. Unlike the results in experiment 4.1, the 
Bhattacharyya ordering here gives the poorest results. 
Further, it does not exhibit a peaking effect, an effect 
that is expected when working with such a small number of 
training samples. The transformed divergence ordering does 
much better and does exhibit a peaking effect. However, it 
has a lot of fluctuations. The K-L ordering, on the other 
hand, while giving slightly poorer results than transformed 
divergence at low dimensionality, is better than the other 
two techniques at high dimensionality and has less fluctua- 


tions 


70 


Wheat and Corn, real data 
(20 samples each) 



K‘L (whc^ data set) 

Transformed 

Diveroerx^e 
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Bhattacharyya 
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Figure 4.2 Classification Results of Data 
in Experiment 4,2 Using Three 
Feature Selection Techniques. 
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Exper i ment U.3 

Another two classes, corn and forest, are selected from 
the same data set described In experiment . 1 . Again, 20 
samples per class are chosen randomly from a larger set of 
training samples, and the three feature selection techniques 
arc compared. Results appear in Figure 4.3 

Again, we notice that the Bhattacharyya ordering does 
poorer than the other two techniques, and does not exhibit a 
peaking effect. Transformed divergence gives better 
results, but again has a lot of fluctuations. The K-L ord« 
ering is superior to both, and has leas fluctuations. 

It should be noted again that the K-L ordering we used 
is based over the full data set. It is dependent on the 
mixture covariance matrix of the full data set, and thus 
ignores any between class variabilities resulting from dif- 
ferences between class covariance matrices. Because it is 
always dependent on the full data set, the number of train- 
ing samples used to estimate the mixture covariance mtrix is 
almost always large, and hence a good estimate is obtained. 

The Bhattacharyya ordering used, on the other hand, 
although it takes into account between class variabilities, 
depends heavily on the number of training samples used to 
estimate the individual covariance matrices of the classes 
at hand. Thus, as the number of training samples decreases. 
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Corn and Forast. aimulalad data 
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Figure A. 3 Classification Results of Data 
in Experiment 4.3 Using Three 
Feature Selection Techniques. 
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poorer estimates of the covariance matrices are obtained, 
loading to poorer transformations. 

It appears that the transformation obtained from the 
simultaneous diagonalization technique is very sensitive to 
the number of training samples used to estimate the statis- 
tics of the classes at hand. While it produces superior 
results when there are enough samples, it falls to do so 
when the training samples are limited. 

Indeed, Wu (50) published results in which he showed 
that the divergence criterit^n breaks down when the number of 
training samples is small, and no longer is s.n effective 
predictor of accuracy. 

The K-L ordering, while Ignoring the among -class vari- 
abilities in the scene, is only dependent on the number of 
data points in the data set used to approximate the mixture 
covariance matrix, but is otherwise independent of the num- 
ber of training samples used. Thus, while sacrificing the 
information we get about the variability between classes in 
the set, experimental results show that this sacrifice is 
more than warranted when dealing with a small number of 
training samples. While not claiming that the K-L ordering 
gives the optimal results, we think it is a very effective 
procedure i- the presence of few training samples, that is 
not surpassed by any other procedure that we know of, given 
the circumstances above. 
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Based on the above, and on the fact that the K-L order- 
ing is a very efficient technique In that It reduces the 
number of permutations of features that have to be searched 
through to only the number of features present, it will be 
used as a feature selection technique throughout the remain- 
der of the experiments. 


4.3 Experiments on the Hughes Phenomenon 

In this section, some experimental results that illus- 
trate the Hughes phenomenon will be presented. The objec- 
tives of conducting these experiments are t • demonstrate the 
existence of this phenomenon in remote sensing applicat loni> , 
and to verify the hypothetical explanation of it. Experi- 
ments will be performed on aircraft and Landsat data, both 
simulated and real. In all the following experiments, no 
results are obtained for the dimensionality of one. Tabu- 
lated classification results are found in Appendix D. 


Experiment 4^.^ 

The data set described in experiment 4.1 is simulated 
using the algorithm described in section 2.4. Two classes, 
corn and forest, are selected and 500 training samples are 
chosen for each class. A larger, mutually exclusive set is 
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used for testing. The K-L method is used in ordering the 
features, and the data selected is classified using the best 
2,3,^, ,12 features. Subsequently, 5 training sets are 
randomly chosen from the larger training set, each set hav- 
ing 20 samples per c. ass of corn and forest. The five sets 
are classified, using the same test fields above, and the 
average classification accuracy, (sometimes referred to as 
the probability of correct classification, or cal- 

culated for each subset of features. Another 5 training 
sets are then randomly chosen, this time with 13 samples per 
class of corn and forest (The minimum number of samples pos- 
sible for 12 features without getting singular covariance 
matrices). Again, the 5 sets are classified and the aver- 
age classification accuracy is calculated for each feature 
subset. The results are then plotted in Figure ^4.*^. 

Looking at Figure 4.11, it is seen that when the number 
of training samples is adequate, as in the 500 samples per 
class case, the probability of correct classification is a 
mono toni cal ly non-decreasing function of dimensionality. 
Since in a K-L ordering, the information is concentrated in 
the first few channels, we notice that after the best 5 fea- 
tures, the recognition accuracy tends to saturate. 

When the number of training samples per class drops to 
20, however, we see that not only does the accuracy drop 
from the 500 samples case, but also it exhibits a slight 
Hughes phenomenon. Although the cjrve has a maximum at 
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Figure 4. A Experimental Classification 

Results of Aircraft, Simulated 
Data Using Differ exit Numbers 
of Training Samples. 
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dimensionality 3 , It is approximately constant until the 
best 10 features, after which it starts decreasing, even 
though slightly. 

The 13 samples per class case offers a dramatic change 
from the two other curves. There is a clear peaking effect 
here, with the curve reaching a maximum at dimensionality 5, 
after which it drops drastically. 

The results conform with the hypothetical curves of 
Figures 2.1 and 2.2. The 20 samples and 13 samples curves 
can be made smoother if more than 5 sets are averaged, and 
hence we should look at them with the idea in mind that 
these are o.'ily approximations of what the true curves look 
like. However, the trend these curves point to is clear. 
In the presence of a limited set of training samples, an 
increase in dimensionality can result in a decrease in the 
classification accuracy, with this effect disappearing as 
the number of training samples increases. 


Experiment M.5 

The same aircraft data set as that used in experiment 
4.1 is used, but without any simulation. 400 samples each 
of corn and forest are selected for tra'ning, and a larger, 
separate set is used for testing. Again, 5 different train- 


ing sets of 20 and 13 samples per class are randomly chosen 
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from the original training set and classified. The average 
classification results for each feature subset are calcu™ 
lated and plotted. Results appear in Figure H.5. 

The curves in Figure *1.5 are not as smooth as they are 
in Figure M.M. This is attributed to the fact that we are 
working with real data, which dees not as well satisfy the 
assumptions we make as the simulated data does. Still, the 
curve with the 13 samples does generally poorer than the 
other two curves and drops dramatically in accuracy, whereas 
the *400 samples curve appears to saturate almost from the 
start. The 20 samples curve appears to have a slight peak- 
ing effect, although the curve is very noisy. 

Experiment *4.6 

The data set used in this experiment is obtained from 
Landsat, flown over Henry County, Indiana. To obtain a data 
sat with more than the *4 features available from Landsat on 
any particular date, four data sets flown over the site at 
different times are used. The dates the data was collected 
on are: June 9, July 16, August 20, and September 26, all 
in 1978 . The data is concatenated, and a K-L transformation 
was performed on it. Simulated data, more precisely meeting 
such assumptions as normality is generated, and the first 1.2 
channels are used for classification. We will refer to this 
data as multi temporal data to indicate that it is collected 


over different times. 


Aircraft, real data 



Figure 4.5 Experimental Classification Results 

of Aircraft, Real Data Using Different 
Numbers of Training Samples. 
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Two classes, corn and soybeans, are selected with 250 
samples per class for training, and a larger independent sct 
for testing. Again, 5 different training sets of 20 and 13 
samples per class are chosen from the original training set 
and classified. Results are averaged and plotted in Figure 
4.6. 


The same results obtained in the previous two experi- 
ments are again evident. Note that even with 20 or 13 sam- 
ples per class, the accuracy obtained is very close to that 
obtained by using all the available training samples. This 
is due to the fact that the two classes chosen are highly 
separable and thus are easily distinguishible even when 
using a small number of training samples to estimate their 
statistics. 


Experiment 4.7 

The same data set as experiment 4.6 is used, but with- 
out any simulation. Two classes, corn and soybeans, are 
selected with 250 samples per class used i . r training, and a 
larger, separate, set for testing. Again, 5 different 
training sets of 20 and 13 samples per class are randomly 
chosen from the original training set and classified. 
Results are averaged and plotted in Figure 4.7. 
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Landsat, muititemporal, simulated data 



Best n Channels 


Figure 4.6 Experimental Classification Results 
of Landsat, Multitemporal , Simulated 
Data Using Different Numbers of 
Training Samples. 


Landsat, multitemporal, real data 



— 250 samples/class 

— 20 samples /class 
— 13 samples /class 



Best n Channels 


Figure 4.7 Experimental Classification Results 
of Landsat, Multi temporal , Real 
Data Using Different Numbers of 
Training Samples. 
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The aame observations noticed In the three previous 
experiments apply here. There Is a drastic drop In accuracy 
when 13 samples are used, a slight one when c? samples are 
used, and no drop when 250 samples are used. 

Summarizing the results of the last four experiments, 
we see that there Is a definite Hughes phenomenon in the 
presence of a limited number of training samples compared to 
the number of features used. Further, as the number of sam- 
ples increases, the accuracy for any given dimensionality 
increases, and the peak in the curve shifts to the right, 
i.e., the peaking effect takes place at a higher dimension- 
ality, as is seen in Figures *1.^1-^. 7. 

Studying Figures 4.4-l{.7 reveals that the region bet- 
ween 13 samples and 20 samples is a very sensitive one when 
working with a maximum dimensionality of 12. While tnere is 
a sharp decline in accuracy at 13 samples per class, there 
is only a slight one at 20 samples per class. Another point 
to note is that the 20 and 13 samples are chosen from spec- 
trally homogeneous classes, and so a very large number of 
samples is not needed to estimate xihe statistics of these 
classes. In a practical situation, the 20 and 13 samples 
curves might not be as close to the curves with large num- 
bers of training samples as they are in these experiments. 
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The results of the last four experiments were a factor 
in choosing the empirical formula, or equation (3.61), dis- 
cussed in Section 3.2.3. A formula was sought that takes 
the sensitivity in the number of trainiiig samples into 
account, as well as other factors that were discussed ear- 
lier. 


4.4 Experiments Comparing Algorithm and Experimental 
Results 

In this section, several experiments will be conducted 
to assess the performance of the proposed algorithm. Again, 
aircraft and Landsat data are used, both simulated and real, 
and the number of training samples used will be varied. But 
first, we will reproduce the results obtaired by Fukunaga 
and Krile (64) to verify the validity of the algorithm. 


Experiment 

The data set used by Fukunaga and Krile is described in 
detail in Marill and Green (12). The data is simulated, has 
two classes and eight features. Each class has 200 training 
samples, and both the exact, or true, and the algorithm 
recognition rates are calculated. The true recognition 
rates are not calculated again in here, but are reproduced 
from Fukunaga and Krile, who used numerical integration to 


arrive at them. 
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Two methods used by Pukunaga and Krile are employed 
here: The normal assumption, discussed briefly in Section 
3.2.1, and the modified gamma assumption, discussed in Sec- 
tion 3>2.2 and used throughout this research. The Bhatta- 
charyya distance was used by Pukunaga and Krile, and alt- 
hough we have shown it to have limitations, it is used as a 
criterion for ordering the features. Results appear in Pig- 
ure 4.8. 

The results show that the modified gamma assumption 
method is a reasonable approximation of the true probability 
of correct classification. The normal assumption, though, 
does not give a good approximation of Pec > hence it is 
not further used. 

While in this experiment, the modified gamma assumption 
is compared to the true probability of error, in actual 
practice the true probability of error cannot be calculated 
because the underlying distributions are not known. There- 
fore, in the following experiments, the proposed algorithm 
is compared to an average of five classifications obtained 
from five different training sets having the same number of 
training samples. This average classification se’^ves as an 
estimate of the "true" error curve. This fact should be 
remembered as the experimental curves that are obtained are 
not as "smooth" as what the true curves would be expected to 
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be. The ilRorithm curv##, on the other hand, being depen- 
dent, among other thinge, on the number of training eamplea 
in an average way, are expected to be •’smoother'* than the 

expert CiMital ones. 


Before wu embark on studying the next experiments, it 
is appropriate at this point to look at a flowchart describ- 
ing the modified algorithm that is proposed. This is shown 
In Figure ^.9. This figure is to be compared to Figure 3»2| 
or Fukunaga and Krile's algorithm, to see the changes that 
are made. 

Experiment ii,9 (Aircraft, Simulated Data, 20 Samples per 
Class ) 

The simulated, aircraft data set used in Experiment 4,i| 
ic used here. Two classes, corn and forest, are used. The 
experimental, 20 samples per class curve, in Figure is 
plotted again in Figure M.10, together with the approxima- 
tion to the probabilit/ of correct classification predicted 
by the proposed algorithm. Also plotted in Figure ^1.10 are 
the standard deviations for each feature subset of the five 
different classifications performed. 

We see that the algorithm is a good approximation to 
the experimental curve. The approximation is not as good at 
lower dimensionalities as it is at higher ones, because the 




Figure 4,9 A Flowchart of the Modified .‘algorithm. 
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Aircraft, simulated data 
(20 samples/class) 



Figure 4.10 Classification Results of Aircraft 
Simulated Data, Using 20 Samples 
per Class. 


experimental 

algorithm 
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assumptions the algorithm makes are better at higher dimen- 
sionalities. However, the two curves do peak at the same 
d imonn i onal 1 1 y , , but more importantly, they have a similar 
shape. Both remain relatively constant for a while and then 
start decreasing at about the dimensionality of 8. 

Examining the standard deviations of P , it i>5 

cc 

observed that in general they have an increasing trend as 
the dimensionality increases. Put in other words, the 
curves indicate that the variance of the probability of 
error seem- to Increase with increesing dimensionality. 
This agrees with the hypothetical explanation given of the 
Hughes phenomenon, namely that the accuracy of the estimated 
statistics decreases with increasing dimensionality (i.e. 
becoming more random and hence increasing the variance of 
error) and that when this effect outweighs the increase in 
separability between classes due to increasing dimensional- 
ity, a peaking effect is observed. As the number of samples 
is decreased, larger increases in the variance of error are 
expected . 


Experiment (A ircraft , Simulated Data , 1 3 Samples per 

Class ) 

The same example used in Experiment 4.9 is used again, 
but with 13 samples per class used for training. The exper- 
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imental ('urvt> of Experiment in reproduced, together with 
the curve predicted by the algorithm. The standard devia- 
tion o(‘ is again plotted. Hesults appear in Figure 
ij. n . 


Again, the curve predicted by the algorithm is a better 
approximation of the experimental curve at high dimensional- 
ity. The experimental curve, however, is not very sensitive 
to dimens iona 1 i 1 y at lower values, and thus a small ambigu- 
ity in where the peak occurs can be afforded. Still, both 
curves predict a peak at 3. The standard deviation of the 
error again has an increasing trend as dimensionality 
increases . 


Exper im ent ;![ . 22 (Ai rcraf t , Real D ata . 20 Samples per Class) 

The example used in Experiment i| . b is repeated. Again, 
two classes are used, corn and forest, from the aircraft, 
real data seit. Twenty samples per class are used for train- 
ing, and five different sets of training samples are classi- 
fied and averaged. The average is then compared to the 
algorithm performance. Results appear in Figure ^.12. 

The experimental curve has a lot of error variance as 
can be seen from the curve and does not seem to be following 
any general pattern, although it starts consistently 
decreasing after dimensionality 9* It is interesting to 
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Aircraft, simulated data 
(13 samples /class) 



Figure 4.11 Classification Results of Aircraft, 
Simulated Data, Using 13 Samples 
per Class. 



Aircraft, real data 

(20 samples/class) 



Figure 4.12 Classif 1 ca rion Reaults ol Aircraft, 

Real Data; Using 20 Sarap''es per Class. 
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compare this curve with Figure 4.10, where the same condi- 
tionr exist with the exception that the data is simulated. 
Because simulated data satisfies the assumptions made about 
the distributions of classes, it produces results that con- 
form more with theory than real data does. The algorithm 
performance appears to be closer to what is expected, alt- 
hough in this case it does not quite follow the experimental 
curve. This "randomness” of the experimental curve is made 
more evident from looking at the standard deviations of Pec » 
which do not seem to follow any general pattern and are all 
r<!lat.lvelv large. This is a clear example of a case where 
deviations from the assumptions may obscure the action of a 
new proposed algorithm. 


Experiment ( A1 r'craf t , Real Data , _n Samples per Class ) 

The same example used in Experiment 4.11 is used here, 
with samples per class for training. Results are shown 
in Figure 4.1;^. 

Experimental and algorithm results here are very close. 
Both peak at 3, and both are very close at high dimensional- 
ities. The standard deviations of the errors are also 
increasing in general, particularly at high dimensionality. 
It is interesting to note that the standard deviation in 
almost all of the above four experiments starts increasing 
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notably at about the aane place the probability of correct 
classification starts dropping sharply. This supports the 
idea that nt these dimensionalities, the randorness in the 
estimated statistics is so large that it pulls the curve 
down . 


Exper i ment ii.J.3 ( L andsat , Multitemporal , Simulated Data , 20 
Samples per Class ) 

The data set used in this experiment is the same as 
that used in Experiment ^.6. It is obtained from Landsat, 
with four dates concatenated so that more features are pre- 
sented. The 20 samples per class curve of Figure i<.6 is 
reproduced in Figure ^.14, together with the curve predicted 
by the algorithm. 

The algorithm curve seems to drop in accuracy faster 
than the experimental curve, but both peak at around 4. The 
standard deviation of error also increases as more features 
are used. 


Expe r iment 4.JQ ( Landsat , Multitemporal , Sim ulated Data , 1 3 

Samples per C lass ) 


The same data set used in Experiment 4.13 is used, but 
with 13 samples per class for training. Results appear in 
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Landsat. multitemporal, simulated data 
(20 samples/class) 
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Figure 4.14 Classification Results of Landsat 
Multi temporal , Simulated Data, 
Using 20 Samples per Class. 
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al^rithm 
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Figure *1.15. The inoreaee in the variance of error with 
increasing dimensionality is very noticeable here. Again, 
the same observations apply, with both curves starting to 
drop in accuracy at the dimensionality of 


Experiment ( L andsat , Multi temporal . Real Data , 20 Sam - 
ples per Class ) 

The Landsat data set is again used, but without any 
simulation. 20 samples per class are used for training, 
classification results are averaged and plotted in Figure 
4.16. 


While the algorithm predicts a somewhat better perfor- 
mance than the experimental curve, both have the same shape, 
and both are fairly constant until the first 7 or 8 fea- 
tures. This is due to the fact that the two classes in this 
set, corn and soybeans, are largely separable and hence the 
increase in the variance of the error with increasing dimen- 
sionality does not outweigh the large separability effect 


between these two classes. 
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Landsat, multitemporal, simulated data 
(13 samples/class) 
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Figure 4.15 Classification Results of Landsat, 
Multi tempera 1 , Simulated Data, 
Using 13 Samples per Class. 
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Landsfll, muititemporal, real data 
(20 sampies/ttass) 



Figure 4.16 
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Experiment ( La nd sat . Multitemporal, . ^ajL Data , 1 ^ Sam » 
plea per C laaa ) 

The Landaat, real data aet la uaed in thia experiment 
with 13 aamplea per claaa for training. Reaulta are ahown 
in Figure M.17. The two eurvea have the aame ahape, and 
peak at the same place, <1, although again the algorithm 
predicta a better performance than doea the experimental 
curve. The variance of error ia again aeen to be Increaaing 
with the number of featurea used. 


To summarize the results of the last eight experiments 
( 4 . 9 -M.I 6 ), the probabilities of error predicted by the pro- 
posed algorithm as a function of dimensionality as compared 
to experimental observations are shown for aircraft and 
Landsat data. Results are obtained for Doth simulated and 
real data, using 20 and 13 samples per class for training. 
For each case, five different training sets are usee, and 
classification results are averaged over these five sets. 
The standard deviations of errors for each feature subset 
are also plotted. 

Results indicate that the algorithm predicts in most of 
the cases the best, or near best, subset of features to be 
used. While not always predicting closely the actual clas- 
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LanoMt, muititamporal, rtal data 
(13 samples/claa^ 



Best n Channels 


experimental 

algorithm 


Figure A. 17 Classification Results of Landsat, 
Multitemporalf Real Data, Using 
13 Samples per Class. 
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nification accuracies obtained from the experimental average 
curve, it has in most of the cases the same shape as the 
experimental curve and seems to follow any trends in perfor- 
mance that the experimental curve undertakes. Since the 
objective behind the algorithm Is to predict the best fea- 
ture dimensionality and specific subset to be used in clas- 
sification rather than to predict the probability of error 
itself, the fact that the algorithirt does not always accu- 
rately predict this probability of error is not of serious 
concern . 

Ihe standard deviations plotted seem to indicate that 
in general, an increase in dimensionality results in an 
increase in the variance of error, that increase becoming 
highly noticeable at high dimensionality, when the random- 
ness in the estimated statistics, given a limited set of 
training samples, is large. 


The next step is to incorporate 
binary tree classification procedure, 
classes, and assess its performance, 
t i 0 n *4.5. 


this algorithm in a 
using more than two 
This is done in Sec- 
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4,5 Experiments on a Binary Tree Claasif loatlon Procedure 

In this section, two data sets will be classified In a 
binary tree classification procedure, using the proposed 
algorithm to predict the optimal features at every node. 

A complete design of a binary tree classification 
procedure should address the problem of how to separate the 
nodes In the tree effectively. Seprations should be sought 
that lead to meaningful classes at the Intermediate and ter- 
minal nodes. This problem should be thoroughly studied 
before a solution can be arrived at. 

It is not the purpose of this research to address this 
problem in any detail. Therefore, no attempt has been made 
here to dictate a particular procedure or claim any optimal, 
or close to optimal, one. The procedure that will be used 
is h.9uristic, the purpose of conducting the next two exper- 
iments is to illustrate the usefuleness of the proposed 
algorithm In predicting the optimal features to be used at 
every node. The problem of how to separate the nodes is 
left as a topic for future research. 

Ex periment 4.17 

The Landsat, multitemporal, real data set used in 
Experiment 4,6 is used here again. 


Three 


Informational 
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classes exist in the scene: corn, soybeans, and other. 1? 
samples per class are used for training, creating 3 spectral 
classes. The reason this is done is that in actual practice 
situations, it is almost impossible to distinguish spectral 
classes with only 13 training samples per class. A much 
larger, separate, set is used for testing (all training and 
test field descriptions are found in Appendix F). The 
binary tree is constructed by using a bottom-up procedure, 
combining the most separable classes. The criterion for 
measuring separability is that used by Whitsict (9), and is 
defined as follows: 

D , ■ erf((2B)^^^) (A. 3) 
erf 

where B is the Bhat tacharyya distance and erf ( . ) is the 
gaussian error function. Whitsitt found that this measure 
is less ambiguous and more linear than the measure B. The 
measure is calculated using the first 12 features after a 
Karhunen-Loeve expansion was performed on the data. After 
the tree is constructed this way, the proposed algorithm was 
used to predict the optimal features to be used at every 
node . 


The binary tree that resulted from the above procedure 
is shown in Figure 4.18. The algorithm predicts an optimal 
feature subset of 4 at the top, and a subset of 2 at the 
intermediate node. These appear below each node. Inside 
the node, the classes present are shown together with the 
total number of training samples present. 
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Landsat, multitemporal, real data 
(13 samples/class) 



C corn 
S soybeans 
O other 


Figure 4.18 B inary Tree Design Structure of 

Landsat, Multitemporal, Real Data, 
Using 13 Training Samples per Class* 
With Numbers Inside Nodes Indicating 
Number of Training Samples Used. 
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A single-stage classification is then performed on the 
data using feature subsets of 2 to 12. This is done to com- 
pare the performance of the binary tree procedure to that of 
each of the feature subsets. 

Results are plotted in Figure *1.19. The classification 
result obtained from the binary tree procedure is drawn in a 
dotted line across the page only to compare against the sin- 
gle-stage curve, and does not imply that all the feature 
subsets were used, or that the classification result is the 
same for all feature subsets. 

The results indicate that using three classes, the sin- 
gle-stage curve has a peak at *1, and that by using all 
twelve features, the result is much poorer. The binary tree 
procedure, on the other hand, results in a classification 
accuracy that is almost as good as the best result obtained 
from using the best feature subset (which is unknown in an 
actual practice situation) in a single-stage classification. 
Thus, it appears that the algorithm is effective in predict- 
ing the best features to be used at each node. 


Experiment . 1 8 

The aircraft, real data set used in Experiment 4.1 is 


used here. 


The data set has seven informational classes 
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LandMt, multitemporal, real data 
(3 classes, 13 samples/class) 



1 2 3 4 5 6 7 8 9 10 11 12 

Best n Channels 


Figure A . 19 Single-Stage and Binary Tree 

Classification Results of Landsat, 
Multltenporal , Real Data, Using 
13 Training Samples per Class. 
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In this experiment, supervised clustering (discussed in r>ec- 
tion 1.2.1) is used to get 9 spectral classes, using an ade- 
quate number of training samples per class. 13 samples per 
class were then randomly chosen from the larger training set 
so that it is known that each set of these samples comes 
from one spectral class. The bottom-up procedure described 
in Experiment 14.17 was then used to build the binary tree, 
with the exception of class water, which was separated from 
the other classes at the beginning, as water has been known 
from experience to have spectral properties that are much 
different from other agricultural classes. The proposed 
algorithm is then used to predict the best features at each 
node. A single-stage classification is performed using fea- 
ture subsets of 2 to 12, and then the same statistics were 
used in the binary tree classification procedure. 

The resulting tree appears in Figure' 14.20. Figure 14.21 
shows the classification results obtained from the single- 
stage and the binary tree classifiers. 

The binary tree procedure, using the proposed algor- 
ithm, performs better than any feature subset does in a sin- 
gle-stage procedure. The Hughes phenomenom is very evident 
here, as the over-all classification accuracy for seven 
informational classes (9 spectral) drops sharply from a high 
of 69. to a low of i43.01t. 


Aircraft, real data 
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Figure 4.20 Binary Tree Design Structure of Aircraft, Real Data 
Using 13 Training Samples per Class. 


Aircraft, real data 
(9 classes, 13 samples/class) 
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Figure A. 21 Single-Stage and Binary Tree 

Classification Results of Aircraft, 
Real Data, Using 13 Training 
Samples per Class. 
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Summarizing the results of the last twfa experiments, 
the proposed algorithm is shown to be effective in predict- 
ing feature subsets that lead to the maximum, or near maxi- 
mum, accuracy possible using the Karhunen-Loeve expansion 
for ordering the features. 

It is worthwhile to note that common belief is that few 
features need be used at the top of the tiee to separate 
classes, and more features need be used deeper in the tree 
to distinguish between somewhat inseparable classes. How- 
ever, if there are inadequate training samples present, then 
the number of training samples towards the bottom of the 
tree is less than that towards the top. Hence, less fea- 
tures should be used at the bottom to avoid the Hughes phe- 
nomenon. This is evident in the last two examples, particu- 
larly in Figure 11.20, where many features are used at the 
top, but only few at the bottom. 

One point also worth mentioning is that in situations 
where a node is divided into two nodes of unequal training 
samples, one of them might have inadequate training samples 
while the other might have adequate ones. This situation is 
illustrated in Figure 4.20, where the top node is divided 
into water, and everything else. In this case, the number 
of features used is "intermediate", depending on the effect 
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of the degradation in the accuracy of the estimated stotls 
tics of the node with the inadequate number of training saro 
pics. 
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CHAPTER 5 

SUMMARY AND CONCLUSIONS 
5.1 Summary of Results 

The purpose of this research has been to develop an 
error estimator that will predict when/if the Hughes pheno- 
menon occurs in multispectral data. Several significant 
results were arrived at and are summarized below. 

The probability of error was studied through the like- 
lihood ratio function, which offered the convenience of 
working with a one- dimensional variable, regardless of the 
number of features used in estimating the training statis- 
tics. An algorithm was then developed to estimate the sta- 
tistics of this function, taking into account the number of 
training samples u'-ed to estimate those statistics. Several 
theoretical and experimental results were obtained on the 
Hughes phenomenon. These showed the deperdency of the prob- 
ability of error on the number of training samples and fea- 
tures used. The algorithm developed in Chapter 3 was shown 
«o predict a suitable feature subset to be used at each node 
in a binary tree procedure. The algorithm was tested in 
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Chapter *1 by comparing, it to experimental observations under 
dirfarent conditions, and was utilized in two binary tree 
classification procedures to demonstrate its practicality. 

Some results were also shown, demonstrating the use- 
fuleneas of the K-L expansion over the whole data set in 
ordering features in the presence of a limited set of train- 
ing samples. The procedure is used extensively in the 
research, and appears to have less variablity than other 
procedures under the conditions given. 

Certain parts of the algorithm developed are heuristic 
in nature. Reasons why more theoretical solutions were not 
pursued were explained. These heuristic procedures often 
raise difficulty in verifying the validity of the algorithm 
strategy. The basic point is that when both a practical 
solution and theoretical perfection cannot be achieveeJ sim- 
ultaneously, one tends tr choose the former. Experimental 
results in Chapter 4 demonstrated that the algorithm can be 
used practically to yield optimal, or near optimal, results. 


5.2 Suggestions for Further Research 

The main objective behind developing the error algor- 
ithm is to use it as a feature selection technique in a mul- 
ti-stage classification procedure. In particular, the 
algorithm was developed to be used in a binary tree proce- 
dure. The design of such a procedure requires, in addition 
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to choosing the optimal features at each node, an effective 
design of separating the nodes. This question was only 
addressed superficially in this research, and could serve as 
a topic for another research project. An effective design 
for separating the nodes, coupled with the developed algor- 
ithm to choose the features, should lead to much higher 
accuracies than a single-stage classifier. 

Sever*al strategies developed in the research were heu- 
ristic in nature. Appendix B addresses the problem of why 
it is difficult to theoretically calculate the probability 
density function of the variances of the likelihood ratio 
function given either class one or two. If such a deriva- 
tion is made possible, a much better and clearer idea will 
be obtained on how the variance of the likelihood ratio 
function is affected by the number of training samples, and 
the error algorithm can be made to more accurately predict 
the probability of error in the presence of a limited number 
of training samples. 

The K-L expansion was used extensively as a feature 
selection technique in the presence of few training samples. 
This was based on experimental observations, but necessarily 
meant sacrificing the information found from the between 
classes variablity. A more detailed study of the relation 
of several feature selection techniques to the number of 
training samples can be very helpful. 
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Appendix A 

Generation of Normally Distributed Samples 


Let Uj and U 2 be two random variables independent and 
identically distributed Uniform (0,1). Then, let 

Z ^ = (-2 In Uj P coo 2 tjU2 
Z 2 = (-2 In sin 2 tiU 2 

then Zj and are independent and identically distri- 
buted norma 1 (0,1). 

Proof : 


(A.i) 
(A. 2) 


f (UpU2) 


1 0<Uj<l, 0 U2'1 

0 otherwise 


(A.i) 


is the probability density function of two independent 


uniforms . 
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« exp ^ ->j(Z J ^ Z^) J 

(A. 4) 


' 27 

(A. 5) 


The Jacobian of the transformation 

is : 

J * 

“2“ + Z2^>] 


f(Zj 

,Z2> * f(Uj ,U2> . 1 J 1 



- 27 + 7.^^] 0 < 

expJ->s(Zj^ -f Z2^)] 


0 < 



= 0 otherwise 

(A. 6) 


f(Zj) -v N(0,I) f(Z2> 'v- N(0,1) 


The side conditions give - “< Z ^ , - ”< Z 2 . 

Strictly speaking, Z^ cannot equal zero; however, prob(Zj s 
0)=0 as we are working with continuous densities. 

To test the effectiveness of the pseudo rnndom vectors 

in the multivariate case, random vectors distributed N(0,lp) 

were generated and then tested with a Kolmogorov-Smirnov 

test. Since the multivariate normal cdf is difficult to 

evaluate, the sum of squares was calculated and compared to 
2 

the Xp distribution. 


126 


For sample sizes greater than 100, the pseudo random 
vectors were distributed properly. For sample sizes less 
than 100, the K-S test is not valid. Since we would gener- 
ally (over an entire area) be working with more than 10C 
points per class, this was not pursued further. 

In addition, the sample covariance matrices were tested 
for homogeneity against the true class statistics. For sam- 
ple runs of up to 2000 points, there were not significant 
differences at the a s 0.10 level. 
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Appendix B 


On The Probability Density Functions 

AM AM 

Of Oj And 0 ^ 


Let us look at the expressions for and , From 
(^.55) and (?.58), we have: 

- 2(tr(I - + 2 ;^ j^-1 (J.l) 

Oj^ - 2(tr(Ej*‘ Sj - I)^ + 2ij ij’’ ij ij"' ij) (b.2) 

To be able to calculate the probability density func- 

"2 *2 ^ r r“^ V 

tions of and , one has to know those of m 2 f » ^2* 

and 

Before we proceed, we make the following assumptions: 

1 . and M2 , the means of the two classes at hand are 

constant. Experience has shown that one can esti- 
mate these two quantities relatively accurately 
with a small number of training samples. Hence- 

A A A 

forth, we will assume m^ to be constant 

and not a random variable. 


ORlGtUAL PAGE 18 
OP POOR QUALITY. 
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2 . and ^2 ®''® independent. We will ignore any 

relationships that might exist between the covari- 
ar.?e matrices of the two classes. 


Theorem B.* 


are each Wishart distributed 


'1 * *'2 

If 

ni and 

is the number of samples used in estimating 


— Ej, n^ and ^ ^21 ”2 •'«spectively , where 


Proof 


with parameters 
n^ rN^ -1 and Nj 


See (B. 1 ) ,pp. 159. 

A 

Thus, ls 1,2 , has the following Wishart distrlbu** 

tion : 


1 . 


(Oi)^ I I — 2 exp(-*s(n^ tr ^ I^)) 


(B. 3 ) 


“jp p(p“l)/A **4/2 

i i * 


22 TI 


r (»5<n, + l-k) 
k-1 ^ 


where p is the number of dimensions. 
Theorem B .2 


"-1 1 “-■I 

is again Wishart distributed with parameters ” , 


hi- 

proof 


See (B. 2 ) 
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Theorem B.3 


If A is distributed according to Wishart, WO; ,n), then 
B = CAC is also distributed Wishart W(* *n), where >{ = 

C J C . 


Proof 


See (B. 1 ) ,pp. 162. 

A A A • 

From the above theorems, we see that 

'-I 

^2 af’e Wishart distributed. Further, as is transformed 
into the identity matrix I, and ^2 is transformed Into a 
diagonal matrix A , the new covariance matrices are also 
Wishart distributed. Hence, is transformed into a diago- 
nal matrix I that is distributed W v 1 /nj^ I , nj^ ) . We will call 
the diagonal elements of this matrix . Similarly, ^2 i® 
transformed into a diagonal matrix A ^ that is distributed 
W(1/n2^ * 02 ). We will call the diagonal elements of this 

matrix is transformed into a diagonal matrix l” 

"-1 

distributed W(1/nj^I » ) * and is transformed into a 

diagonal matrix A ^ distributed W( 1 /n 2 A , n 2 ) . 


Thus, after applying the simultaneous diagonalization 

A 2 ^2 

transformation, Oj^ and o^ become: 


“ 2 


P 

2 Z 
1 = 1 


^ ^ 2 A 

+ 2 d. ^ ''^i ) 

^ ^ 1 A o 

Xj Xj^ X 2 


(B.4) 


P 

= 2 E 

1-=1 " 2 

Y. 


(^i 


- 2^+2 H ) 

1 . 2 


2 


(B.5) 
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Note that equations (Bt*4) and (B,5) are modified ver- 
sions of equations (3.vj) and (3.56). 

We now look at the probability density functions of the 

A 

one-dimensional elements and Yj. 

Theorem 

If = 0 i j » and if A is distributed according 

to W( E »n), then ^n, *22' •••* ^pp independently dis- 

tributed and Ajj is distributed according to W(^jj,n). 

P roof 

See (B. 1 ) ,pp. 163. 

Therefore, , . , , , Xp are each distributed W{ ^ ,r^ 

n, 

and Y^* ••••*Yp ane each distributed W(1/n2,nj). Hence, 

.(nj-2)/2 ^ n,/2 

exp (-h njYj) (nj/2) 

r (nj/2) 

0 


' -1 

A similar expression exists for , with 
replaced by Y^^ • 


>0 

(B.6) 

Y. <0 
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.(n,-2)/2 . n,/2 

exp (-»s ("2/2) 

__ 

r (n2/2) 


> 0 


(B.7) 


V 0 



0 


A similar expression exists for , with X^ 

^ -1 -1 

replaced by 

Looking at equations IB. 5) and (B.6), we see that even 

A 

though we know the individual distributions of and 

'2 " 2 

the calculation of the densities of Oj^ and 02 is still a 

very involved and difficult process. An attempt to arrive 

at these densities directly from those expressions is almost 

"2 “2 

impossible. However, the moments of Oj^ and ^2 calcu- 

lated, 

^ -_1 - --I 

Since calculating the moments of Xj^ (and ^ 

involves the evaluation of an integral of the type 
^^’dt, and since such an integral does indeed exist, 
the task of calculating any moment of X^, X^ , 1 and y”^ 

is a very easy one. 

From any integration table book, we find: 

/“ exp(-at) dt * ^ (n >-l, a> 0) (B.8) 

0 n+1 

a 


Thus, if X is distributed W(x/n,n), then: 
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E(x) - X 

E(x^) - (1+2/n) x^ 

E(x^) - (1+6/n + 8/n^) x^ 

E(x^) « (1+12/n +44/n^ + 48/n^) x^ 


Since any raoinent of or is a function of the 

''-1 * '■-] 

momenta of , Yj^, and Yj^ , it is theoretically possi- 

ble to calculate any moment of and o^. Thus, it is 

theoretically possible to calculate the characteristic func- 
^2 "2 

tion of or °2 uniquely from these moments. 

Papoulis (B.3) provides a way to estimate the probabil- 
ity density function of a random variable once its charac- 
teristic function is known. However, the convergence prop- 
erties of calculating the characteristic function from the 
moments of a random variable are very slow. A large number 
of moments would have to be calculated. Looking at equa- 
tions (B.4) and (B.5), it is evident that beyond the first 

few moments, the derivation becomes quite a formidable task, 

and is very impractical. 

Because of these difficulties encountered, it was 

'2 "2 

decided to calculate only the variances of and <^2 
heuristically incorporate them into the algorithm developed. 
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Appendix C 


Derivation of the Variances of o| and 


We look first at 

From Appendix B, equation (B.4), we have 


i=i L 


1-2 




+ 2 d. 


ill 
X? J 


(C.l) 


A 

Noting the assumption that the X^'s are independent from the 
Y^, and taking the expected value of o|, we get 


E(o^) 


P r 

: E 1 - 2 — 

i=l E(X ) 


e(y?) 

+ — + 2 d 

E(Xp 


^ n ^ 

2 "ill! 

^ E(a2) ^ 


(C.2) 


Making use of the expressions in (B.9), we get 


E(52) = 2 E 1 - T- 
^ i=l [ ^i 


+ (1+^)(1+|-) — 

1 "2 ,X2 


+ 2 d2(l+^) ^ 
^ "2 X2 


(C.3) 


Now note that and o| are the summation of uncorrelated 

A A 

random variables. Since X^'s are independent, Y^'s are inde- 
pendent, and each Is independent from each then any 

A ^ 

function of X^’s and Yj^’s in one dimension is uncorrelated 

A ^ 

with any other function of X^'s and Y^'s in another dimen- 
sion. Hence, the variancesof a? and consist of the sum 
of the variances in each dimension (See (69), p. 211) and 
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do not have any cross-product terns between dimensions. 
Therefore, In the following derivations, we will not attempt 
to derive any cross-product terms as they will cancel out In 
the end result. 







■f 


cross-product 

terms 



Substituting the expressions of (B.9) Into (C.4), we get 


[E(a2)H 


P 

it Z II 
i=l 




(1 + f-) 

”2 


+ 4 (l + ~)(l+4) - A (1 + ^ + -T><1 n 

"1 "2 "1 " 


- + ^) 
2 "2 


Q J 2 

. X . 4) . i (1 . M + M * M) 


n . 


((1 + + -^ + -^) + 4d2(l + X + 4> + 4dVl + 4 

"1 "I "i ^ "x "i * "ij 


cross-product 

terms 
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1936 . 576 . 576 . 2112 

2 1 \ "^3 "^32 

ti^Ti2 "2"l ”l"2 


2112 

« 2_3 

"l"2 


2304 

3 3 

"l"2 




+ 4d?(l + 



11 

"2 


. 8 . 44 . 72 . 264 . 96 . 48 . 
+ — + -2 + + 2T7- + :t + 


nr n- 


>1»2 niH2 


n-i n 


1"2 


288 . 352 . 384 \ 

3 „ 2„2 ■*'2 3 1 

’^l^Z *'l*'2 *'l"2 / 


+ 4dJ 


^ 1 + ~ + - 
V "i " 


12 ^ M + 
2 

2 "o 


^ ^ iL| ^ 

"l"2 ”l 


88 

n, n? 

1 i. 



cross-product 


(C.5) 


terms 


[E(52p] 








"l"2^ xj 


+ 




cross-product 

terms 


P 

4 Z 
i»l 




+ 




4 

”l"2 


+ 2d' 
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cross-product 
terms 

(C.6) 

Now, Var (o^ = [t5(o|)^] - [E(o 2)]? 
or , 



. 1920 . 576 . 576 . 2112 . 2112 . 2304 . , / 4 . 8 

+ — - — r + — 5 + — 5 + — q + — 5 — "->■ + — ; — 7 + 4d . I + 

^l"2 "l"2 "2"l "l"2 "l"2 ”2 



+ 


_L J. AO 4. 64 . 256 . 96 . A^ . 288 
"i "2 12 ni «2 "l "2 "2 "l "2 


352 . 384 \ 

r,2„2 „2„3 / 

l"2 12 ' 


+ 



24 


48 


" l*‘2 


+ 


88 

« r.2 

"l”2 



4- 


(C.7) 



Q > 
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Next, we look at o| 

From Appendix B, equation (B.5) we have 


2.2 

2 ^ 


p [xj X d|X 1 

1-1 [v, Y1 Yj J 


(C.8) 


E(^2) . 2 


E(X2) 

E(X ) 

- 2- ■ ♦ 

2djE(X ) 

^ ^ I 1 


e(yJ) 

8(yi> 

E(y?) 


(l+i^ 

)(1+^) X2 - 

2X^ + 1 + 2d2(l+^)X^'l (C.9) 


[E(52)2] - 4E 


? / 4 - — 1 + ■>d2^^ 


cross-product 

terms 


P 

. 4E i: -rj + AXH 4 - 

1 -iyi "\>i 


A U 2X? ( A - 


il ^2 


4dJ 2dJ 
+ 

yJ Yi 



1 + 12 ^ Ai Y. M 

”2 ^2 ”2 


nxj ( 1 + — + -■% 

1 \ "2 ”2 




. A . A +8' 
1 ”1 "1/ ‘ \ "1 "1. 


12 


1 I + 1 W cross-product 
' ' terms 


.1 [x; A + I?. * il + 0*iL * ^ ^ 

i-1 ^ ”l "2 "l"2 nj n^ nj n| njnj 


. 528 . 1936 . 576 , 576 . 2112 . 2112 . 2304 A 

«TnT ^ ^jz~ ^ rTT” ro ^ ^ ro* 

fl^D« 


12 


12 **2**1 **1**2 **1**2 




+ 4X 


3 /?! 4 12 + A 4 . ii + 8 _J2__ ^ 2^ . 

i n^ n2 n| n| n|n^ n[ n^n^ 


.352. 4 .384. Ng ,/i.AxA + A + Ax _1L- X -JA_ X 
"i"2 *'2"i / ^ \ "l *»2 "i n2 ^ 1^2 "l": 


64 

«2„2 

"l"2 


+ „2(?3 + + i + JL2_V X + A * _li_ 

‘ \\ "1 “2 “l"2 / ^\ "2 ”1 ”l“2 

A ' "ft) ' ^ * ^ * ^ * ^^ " tf * ;;ft ^ 


.j_ cross-product 
terms 


[E(o 2)]2 » 4 2 

A • __ 


i»l 


"1 "2 "i"2 7 


^ cross-product 
terms 


Xj -2X^+1 


(■ • i>' 


B 4 


+ 4X5 ( dH 1 + — + — + 

i I i V 02 


_B_ + 4 + 8 X.('i+ X + X + _i_ 

"l"2 “j ”i"2' ' '* 5 . "2 "12 


uo 


+ d 


i((” ^ t t * ii) • "i(‘ * ^)) 


crosB-produc t 


Var(op - [E(op2] . [E(o 2)]2 or 


8 ^ 8 


+ + M + JiL 

i«ll *V"1 *'2 ^2 "i **2 "i"2 


Var(o|) . 41 


512 , 1920 , 576 

n n2 n^n^ n^n 

"l 2 12 "l"2 

"l"2 

2112 ^ 
”l"2 

2112 

n^nf 
1 *. 

2304 

"i”2 

^ , 8 , 40 ^ 64 

- + 256 . 

96 

+ 48 X 

288 

"2 n2 nj^n 

2 ” 1*^2 

"2"l 

+ — + 
"1 


fjL + ±^l. + ± 

32 
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Because we do not know the true values of we substitute 


for in equations (C.7) and (C.12) by X^. 
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Appendix D 

Class! fleaelon Results 


Tables 


Table D. 1 Classification Results of Aircraft, Simulated Data 
Using 20 samples per class. 
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Table D.2 Classification Results of Aircraft, Simulated Data 
Using 13 samples per class. 


143 


u 

u 

a. 




o w 





0) 

<N 

m 


CM 

• o. 

o\ 

CM 

\0 

m 

Q K 

• 


. 

• 

• w 



fSI 

CM 

cn 





0 





JS 





a» u 





00 -H 





« w 





hi o 

m 

m 

r* 

CM 

0) 00 

1. 

• 

. 

• 

> r-i 

o 

o 

ON 

ON 

< < 

o> 

o> 

00 

00 

0) 





00 ' 





<0 hi 





hi 0) 

o> 


00 

in 

01 p. 

• 

• 

• 

• 

> K 

o> 

m 

CM 

cn 

< u 


<7' 

ON 

ON 

0) 





H 





O. 



m 

GO 

B m 

• 

• 

• 

• 

n) 

sD 

e*i 

00 

On 

tn 

CD 

o» 

00 

00 

o> 










a 


in 

■.g 

CM 

B sr 

• 

* 

. 

• 

RJ 

tn 


.n 

in 

M 



ON 

ON 

0) 










a 

CM 

cn 

00 

CM 

B 

• 

• 

• 

• 

R> 


.a- 

•o 

cn 

tn 

00 

o* 

ON 

ON 

0) 










a 

as 


CM 

cn 

B eg 

m 

. 

• 

• 

(0 


•gf 

n 

so 

tn 


o> 

ON 

ON 

0) 





f— 1 





a 

r-) 

»n 

<n 

00 

6 

• 

• 

• 

• 

« 

sD 

<n 

CM 

CM 

CO 


ON 

C^ 

ON 

in 





fH 





0) 

c 

CM 

m 

<r 

m 

c 

1 

1 

1 

1 

18 


p-4 


w-4 

JC 





u 






sC 

NO 


NO 



00 


m 

in 

in 

CD 

00 


• 

• 

• 

• 

• 

• 

• 

CM 

cn 

m 

m 


<n 

m 


in 

<n 

<n 

NO 

o 

cn 

.o 

« 

• 

* 

• 

• 

• 

• 

00 

n. 

sO 

c*> 

e^ 

n 

00 

00 

00 

00 

00 



vO 


•» 

•» 

CM 


in 

m 

00 

• 

• 

• 

• 

• 

• 

• 

CM 

o 


in 

sC 

SO 

NO 


ON 

OCI 

00 

n. 

r-h 

NO 


-ST 

O 

ON 

m 

ON 

o 

00 

• 

• 

• 

• 

• 

• 

• 


in 

o 

m 

ON 

m 

ON 

as 

00 

00 

CO 

NO 


NO 



CM 

ON 


cn 

•o 

CO 

• 

• 

• 

• 

• 

• 

• 

•o 

ON 

cn 

GO 

as 

ON 

ON 

ON 

00 

00 

00 

n* 

NO 

NO 



o 

P^ 

-» 


CM 

•ST 

• 

• 

• 

• 

• 

• 

• 


cn 

o 

m 

sC 

in 

GO 

CJN 

On 

00 


n. 

NO 

m 


o 

o 

NO 

•o 

CO 


oH 

• 

• 

• 

• 

• 

• 

• 

m 

•o 



CM 

NO 

m 

ON 

On 

00 

00 

CO 

NO 

so 



O 

o 

o 

NO 

as 

00 


• 


• 

• 

« 

• 

o 

rH 

00 

ON 

•sT 


o 

On 

Os 

oo 

00 

r* 

so 







o 

•*4 

CM 

NO 


00 

ON 




1 

1 

1 

1 

1 

1 

1 



•*4 

■— 



•H 


f 



Table D.3 Classification Results of Aircraft, Real Data 
Using 20 samples per class. 
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Classification Results of Aircraft, Real Data, 
Using 13 samples per class. 
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Table D. 7 Classification Results of Landsat, Multitemporal 
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Appendix E 


Computer Program Listings 


nnr>nnooonnoor>rioonrir5or>r>or>nonr>rinoof>rtonnnoortnonoor>r>r>oonnnr>nnnr»nr oonnoo 


OF POOR QUALITY. 


FILE- 6WR1TE FORTRAN A LARS / PURDUE UNIUERSITV 


WRITTEN PV DILL PFAFF 

EDITED DV NARWAN NUASHER UUNE 14. 1900 


THIS PROGRAM GENERATES SIMULATED DATA DASED ON A 
CLASSIFICATION MAP OR A GROUND TRUTH MAP EACH PIXEL 
GENERATED THUS COMES FROM A KNOWN CLASS DISTRIIIUTION. THE 
METHOD USED IS AS FOLLOWS 

1. A COOD CLASSIFICATION IS CHOSEN AS A CASE FOR 
simulated data 

2 FROM THIS CLASSIFICATION WE KNOW THE NUMBER OF CLASSES. THE 
CLASS STATISTICS. AND THE CLASS OF EACH PIXEL IN THE 
AREA CLASSIFIED 

3. A STREAM OF UNIFORM RANDOM NUMBERS IS GENERATED FOR 
EACH CHANNEL THEY ARE CHANGED 10 NORMAL <0. I) DEVIATES. 

4. FOR EACH PIXEL, A RANDOM N<0. I) VECTOR IS TRANSFORMED TO 
LE DISTRIBUTED ACCORDING TO THE CLASS STATISTICS OF THAT 
PIXEL THIS IS THE SIMULATED D.STA VECTOR 

5 AS EACH LINE IS COMPLETED. IT IS WRITTEN TO AN OUTPUT TAPE. 

TO RUN THE PROGRAM. YOU NEED TO HAVE THE FOLLOWING 
EXEC FILE ON YOUR DISK: 

OETDISK LARSV8 
GET DISK DV5YS 

CLOPAL TXTLID CM5LIB FORTRAN SSP370 
FILEDEF 6 PRINTER 
FILEDEF lA TERMINAL 
FILEDEF 12 TAP2 

FILEDEF 11 TAPI <RECFM VS LRECL 1500 BLKSIZE 1500) 

LOAD SWRITE OLOCOM MMTAPE TAPOP DCOVAL GT5ERL GTDATE MFSD 
RAHDU WRTMTX 
START SWRITE 


THE PROGRAM WILL ASK FOR INFORMATION SUCH AS 
TAPE NUMCERS, FILE NUMBERS. . . ETC FROM HERE ON, IT 
SHOULD BE EASY TO FuLli' w' 


VARIABLES USED IN TPRINT 

A = COVARIANCE SI DRAOl* FOR FACTORING 
AREAND-- AR(A MUMUFH Or CLASS I UCAT ION 

n -ciivARiANci; sippAcr. fur multiplication 

DATA ‘-Dr*IA POINT STORAGE 
DATVAL«=LJNE nUMDER AND ROLL PARAMETER 
JCAL «^CALIPf<AT10N IMrOMMATlON 
IDREC «IDL0TIFICAT1UN RECORD STORAGE 
ISTART«=BTAHTING POINTS FDR GAUSS 
LOGDAT-DATA POINTS IN LOGICAL FORMAT 
NOCHAN=^r:UM;JtR of CHA' NELS IN CLASSIFICATION 
NOCLAS*NUIi:’C-R OF CLA.FSES IN ORIGINAL STATISTICS 
NOFLDS'--NUMt TR D" TEST FIELDS 

nopoul=^nom'::r of pooled classes 

PNTCLb*CLAr-,L2F I CAT IONS ARRAY 
Z ^STATISTICS STORAGE 


INITIALIZATION 

• •**•*******«• ••«♦•■#*•♦**#*••#**#••••* «*•#**♦•»##«*.»***• #»f •*••*#• 

INTEGER«2 12, INTDAT, ICAL<3), IL1N(2) , PNTCLGC 1000) , ISTAT<4). 

4 FEIVCSOP) 

LOGICAL*! ! 1 (LM, LO.U) *,T <2 », LCAL<^>, t'ATOUT < J2000) 

REAL^^^4 A(-’,’). AL’I 12). 7 ( 2V00 ) , B ( 1 2, 1.'’), l)AT,M ID, 

1 KFLAIKOO, ID, RVARI30, 12. 12). ' 'A ) , F(»-,)CAl. ' 5, .TO) 

INTEGER*/) H Tadt < ID , TiT;. 1 0< 1 7 ) , A'’'.A-,.i I PPLC < 200 ) , TA:>ENO, THREE, 

t Cl ,,rur (SC t , irit«N<ao. I2). w.n.do, K', i;>). Vr.s, no, date cd 

INTEGER* A H,GT 

EnMivAtr:.::' ^i,'.-, 1. 1 ). < intdat, logdat ), (ical, lcad, (lnwrt, ilin> 

EGUlVALLtiw: {EUGCAUd, 1), IDREClbl)) 

DATA EUB, b-AM / 'EO.S ',1 0,00/ 
data YES. no. three / 'YES 'NO '.'3'/ 



nr-rx-in 


CR:Gi:*/ii IS 

OF POCF! QUALITY 


file: qwfmte fortran a lars / purduc university 

DATA FLCT /'Sill */ 

E»’S‘ I . E-a 

LOAD lArr.B A!JO READ PARAHETCRQ * ** ••••*•••••« 


URITE<I6 . boot 

SOO Ft)Kf1AT(//5«t. 'BPECIFV TAPE NUMBER ON WHICH RESULTS FILE IS LOCATED 

♦ /rx. ',TVPh Eicur digit tape NUMBER) '» 

Rl Ar)(I6. B05) INTAP 

»09 FtmMAT(IB) 

WRITE( 16. 510) 

READ(16. 5I5)IFILE 
515 F0I'MAT<13» 

CALL MMTAPE ( INTAP. IFILE.O) 

Wr«lTn( 16, 570) 

570 FORMAT! //5X, 'SPECIFY THE TAPE NUMBER ONTO WHICH SIMULATED DATA IS 

'<TVPE EIGHT DIGIT TAPE NUMBER)') 

RLoIU 16. 57S1TAPEN0 
575 FORMAT! IB) 

WLIIE! 16, 5G0) 

5H0 F0f:ilAT!5X, 'SPECIFY FILE NUMBER AT WHICH SIMULATED DATA IS TO BE W 

♦ ITTf,N'/5X. '!TVPC THREE DIGIT FILE NUMBER)') 

RFAU! 16. 5B5)JFILE 

SB5 FORMAT! 13) 

WR ITL! 16. 590) 

5<’0 F0KI1AT(//5X, 'SPECIFY THE RUN NUMBER FOR THE SIMULATED DATA RUN'/ 

1 5X, '!TYPE EIGHT DIGIT RUN NUMBER)') 

READ! 16, 575) RUNfJO 

CALL MOUNTITAPENO. 12. 'Rl') 

MAf<C,«JFlLE-l 
IF ! MARC LE 0) CO TO 3 
DO 3 LIP»I.MARO 
CALL T0PFF(12) 

3 COuriNUE 
5 R'EABllDI 

IF ! I NE 1) CO TO 310 

REA!)! 1 1 > I , J. NOCLAS. NOCHAN. NOFLDS. NUPOOL. (PETVC3! IX ). tX-l. NOCHAN) 
NQCM*-! !NnCHAN-*-l >/2)«2 
NDCOMP-NOCHANtlNDCHANFI )/2 
I SK i^*'Nn CfIMP «iNDPOOL 
1 1 NU ~ I S T OP -t NOCHAN • NOPOOL 
15 READdDI. J.K 

IF! I l,T 3‘ GO TO 15 
1F!K NE EOS) GO TO 15 
READ! 1 1 ) 1, J, !Z! IX). |X>t. lEND) 
no 17 IX*1,IEND 
1X)-Z( IX) 

17 CONTINUE 

45 READlini.AREANO.NOPNTS. NOLINE. INFO. IDHEC 
NjrtT3*-NDCHAN 
1F!1 NE 5) CO TO 45 
U’RI7E!6, 520) 

520 format ( IHl / ///5X, ' ' ) 
WRITE! 6. 525) 

525 FOKfAT! 5X. '+DATA SIMULATION USING MCCABES EQUATION^') 

WRITE! 6, 530) 

530 FORMAT ! 5 X , + + + ' ) 

lr.R I TE * 6. 535 ) R JN 10, 1 DRFC ( 3 ) 

535 FORMAT! ///,'5X. 'SIMULATED DATA RUN IS', 19, ' FROM RUN'. 19) 

Wt:nf!6, 537) info: 4). INFD!b). INF0!7;, INro!8) 

537 F0HMAT!/5X. 'LINE', 15. ' TO LINE'. 15. ' AND COLUMN'. 15. ' TO COLUMN'. 
*5) 

WRITE(6. 540) INTA'>, IFILE 

540 F0f;M*,T!/5X, 'INPUT P.ESULTS FILE IS ON TAPE'. 19. ' FILE'. 14) 

F.KI IL«6. 54 5)TAPE;.0. JF ILE 

045 FT MAT!/5X. 'B I.-iUi.A TED DATA IS ON TAPE'. 19, ' FILE'. 14) 

K'T f TE!6. 550) 

550 FORMAT! /5\. 'CHANNELS USED') 

DJ L.,0 1X^.1, I. Nl 

ir-nr!6. OOOjFFUndXl.FROCALd. IX). FR0CAL(2. IX) 

555 rC‘ 'AT(5> , 12. 2x. ( b, 2, F5, 2) 

560 CCMTINUF 

C.M L GTDATE!DATE) 
l-.'T I TC (6, 5fi5 ) DAT F 

565 F0NMAT!/'5X. 'DATE OF SIMULATION IS '.3A4) 
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C. 




Q JAILS ff 


FILE; SURITE FORTRAN A LARS / FUHDUE UNIVERSITV 


IDFC-I 

FACTOR COVARIANCE MATrIceS* 


DO 30 IX«I.NOPnOL 
IDONI.*^ IIIEC*NDCOHF-l 

D0°?0 IV-IBEO, I DONE 

20 A(K>-2<IV) 

CALL MFODIA. NOCMAN.EPS. lER) 
IFdER EO -l» CO TO 300 
IFdEH CE. 1) 00 TO 310 
K«0 

DO 2» lY-lBEO. lOONE 
K«M*I 

25 2dV)-A<K» 

30 lBEO«rDEO^NOCOnP 


CENItRATE START!' 3 POINTS 


2V Wf? I TH ( I A# *** « ) 

3 FORMAToi, 'DO YOU WANT TO SPECIFIY THE STARTING POINTS FOR THE'/S 
*. 'RANDOM NUMBER CENERAI'OR? (TYPE YES OR NO)') 

REAL ( 16. 32) INPUT 

32 FORMAT (A4) 

IFdNPUT EQ NO) CO TO 36 
IF (INPUT. EO YES) CO TO 33 
CO TO P.9 

33 DO 39 I X-1. NOCHAN 
WRITE! 16. 41 ) IX 

41 FORMATOX, 'SPECIFY STARTING POINT FOR CHANNEL'. I3/SX. '(TYPE A NIN 
» DIGIT 000 NUMOER) ') 

READ t 1 6, 42 ) I START < I X ) 

42 F0«MAT(I9) 

39 CONTINUE 
CO TO 43 

36 CAU GTHERLdGERL) 

ISCRL ■^-dSERL/10)#B<-I 
DO 4U IbI.NOCH 
I EF Rl.c I SERL+ 1 000000 
IG1AWT( I )«=ISERL 

40 CONTINUE 

43 WRI1K6. 34) 

34 pnHMAT (/.-//SX. 'STARTING POINTS FOR RANDOM NUMBER GENERATOR'//) 

DO 44 1 el. NOCHAN 

WRITC(6,35)I. ISTARTd) 

35 Ff)PMAT»5X. 'STARTING POINT FOR CHANNEL '.12.' IS '.19) 

44 CONTINUE 


READ CLAf-yiFICATICjNS 


c 

C 

c 

c 

c 

c 

c 

c 

c 


J ?'f rc ( 1 )*TAPENO 
U‘f C'C(2)»UFILE 
inRfCO) t-- RUNNO 
N.IM ■ 1DREC(5» 

1 1'f t r ( t, > - NOCHAN 

JL'fiiC(.‘.> * 4'.<(N0PNTS ♦ 9)/4) 

NO' 'A" TDnrC<6) 

JD»a El/) « PLOT 

DC) 14! II«1,3 

IDPf (I1-U6) - DATEdI) 

141 CONllNME 

ID!crC(20) » NOLINE 
DO 145 II» 1, .NOCHAN 
I NEW FETVCOdI) 
n.') 145 112 =1,5 

FRvC AL <112. II ) = FROCALdlE. INEW) 
145 CDNTIf-JF 

LIP ■= I.UCHAN ♦ 1 
DO 150 II «-■ LIP.NDLD 
00 150 112 >'• 1,5 
FRQCALdl2. II) ■ 0 0 


nnnrto on 




OF POOR 


I' ir * »“*»* 


quautv 


FILE: BUniTE rORTRAN A LAR8 / PURDUE UNIVERBITV 
ISO CONTINUE 

CALL T0r»UH(12, nOO. lER. IDREO 
ir<IER Nt O) V<rMTE(l6.L’34MCR 
iF<ir« rr o) CO TO oio 

DO {)0 MA I , f«JCl AS 
CLAI'NKMA 0 
DO SO Mi;^ NOCHAN 
IMFANdtA, ' 1-0 

hmi:an<ma, m /-o 0 
DO so MC . . NOCHAN 
IVARtMA, MB.riC )-0 
SO RUAHdMA, no. NO- 0 , 0 
LNWRT > 0 

SS READ( 1 1 )J.K.LtNENO. (PNTCLBdX). IX>1.N0PNT8> 

IF<J CT I,) 00 TO 9S 
LNWRT-l.NWRT^I 

IF (MODtLNURT. 2S>. FO 0> MRITE< 16* S7>LNURT. NOLINE 
57 FORMAKSX. 14, ' LINES OUT OF '.14, ' ARE COMPLETED'! 

«••••••••«*••••••••••••••••••••••••••••*•••»••••»•••••#< 

GENERATE AND WRITE DATA POINTS 


60 


65 


70 


75 


no 


90 


95 


1S«TL1N(2> 

DATOUTin-LlIl) 

DAT0UT<2)-LI<2> 

I2«32767 

DATDUT(3)-L1U) 

DAT0UT<4>=LKS) 

I2"0 

1 COUNT-4 

DO 90 IX-I.NQPNTB 
ICOUNT-lCOUNT^l 
I2«=PNTCLS< IX) 

LKl )« FALSE 
IPOL-< 12-1 )*NOCHAN 
IBEO-< 12-1 )*NOCOMP 
H’=iBEG 

DO 65 I y«l. NOCHAN 
DO 65 IZ-l, IV 
K-K-H 

DUV. 1Z)«Z<K) 

IF< lY EO IZ) CO TO 6S 
0( IZ. lY)-0. 0 
CONTINUE 
DO 70 lY-^l.NOCH 

CAU RANDU< ISTARTC lY), NXINP, A2< IV) > 

1ST. 9T( 1 Y)-NX1NP 

call RANDU< ISTARTl IY), NXINP. A( lY) ) 

ISTARTUY) ■ NXINP 

A< IV)»-SQnT<-2 ♦AL0C<A2{1Y)))«C0S«6. 2S3ie«A(IY)) 
CONTINUE 

CLAPNT< I2)-CLAPNT<12)*1 

DO 00 1V=1. NOCHAN 

DATA ( 1 V » = 0 0 

io*'Nnpnnu*NornMP+ifnt„*!Y 

DO 7S IZ-'l. NOCMAN 

DATA( IV»^DATA( 1V>*D(IY. 1Z)«A<1Z) 

DATA< IY)-DATA( 1Y>*Z(IQ) 

INTDAT=DATA< IY)+ 5 
IFdNTDAT LT 0) INTOAT-0 
ir< INTDAT. GT 255) )NTDAT='25S 
ISTAT( JY'-INTDAT 

DATC)UT( ( IY-1 )«NUnAM*IcaUNT)-L00DAT(2) 

DO 92 IZ- 1. 6 

DAT OUT ( < IY-1 )«NDSAM«-1C0UNT + 1Z)« FALSE. 

CONTI NUi 

DO VO III. NDCHAN 

inCAN( ID, 1 1 )- n:.“AN( I2. tl )«ISTAT< 11 > 

DO 90 UU=n. NOCHAN 

IVAFH K . n. JJ)=WAR( 1.?, 1 1 . JU) f-I STAT < 1 1 ) # 1 STAT < JJ) 
CONT INot 

NC R YT F = 4 ^ NOC H AN< r inr?AI1 

CALL TO' ..Rv 12. lO! VTL. lER. DATOUT) 

irdFR li 0) WMTr(16,234)IER 

IF< lER OT 0) CO TO 310 

CO TO 55 

CONTINUE 


uuuuu 


w* - 


Of POOR (JUAUTf 
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FILE- BWRITE FORTRAN A LAR6 / PURDUE UNIVERBITV 


98 


100 


609 

610 
619 


6S0 

6SC 

639 


630 


639 


i 0fS 
w 

649 


690 


?34 


DO 100 IP*t.NOCLAS 
DO too 10=^1. MOCHAN 
ir«Cl APNT< IP) LK 0) CO TO 90 

RhUAN { I P . I Ci ) »Pl MAT ( I MEAN ( IP, 10 > > /FLOAT < CLAPNT IIP)) 

DO 100 IT»IO. NOCHAM 

IFICI A^NTIIP) LF: I) CO TO 100 

REI'WT «'f LOAT < CLAPOT IIP)) 

RFVA'<-FL(JATnVARI IP, 10, IT! ) 
reufan-floatiiw aniip, ign 

BEi-.i AN-FLOATI IMLANI IP, IT) ) 

nVA«(IP, 10. 1T)«(1 /(REPNT-1. ) ) •IREVAR-REnEAN*BENEAN/REPNT) 
RVARdP, IT. I0>«RVARI1P. 10, IT) 

CONTINUE 

DO 649 IP«1.N0CLAS 
URITEI6. 6091 IP. CLAPNTI IP) 

F0PrV,TllHl/9K, 'CLASS NUHDER M3, 9X, 10, ' POINTS'///) 

FmiTE<6.610) 

F0f<MAT<37)(. 'ACTUM. ',4X. '61HULATED' ) 

URITr<6. 619) 

FORMATODX, 'NEAN',7X, ')«AN'/) 

DO IX«l,NnCHAN 
NINr,.NOCDM«*NnCLASfl IP-1 )«NOCHAN 

WR I Tf 1 6, 6i:0 ) FETVC31 1 X ) , FROCALI 1 . FETVC3I IX)). FROCAL ( 2. Ft TVC3I IX)). 
«2<NIUC«1X).RI1EAN(IP, IX) 


F0RMATI5X. 'CHANNEL*. 13. 2X. 'I '.F9 2, '-',F9. 2. ') '. 9X.F8. 3.3X.F8, 3) 

CONTINUE 

WRITE (6, 629) 

FORMAT! /////9X. 'ACTUAL COVARIANCE MATRIX') 

DO 630 NO»l.NOCOMP 
NINC-! IP-1 )«NOCOI1P 
AINU)*Z2!NINC^N05 

CALL WRTMTXIA. NOCHAN. FROCAL. THREE. FETVC3) 

WR1TE(6. 639) 

FORMAT! ////9X, 'SIMULATED COVARIANCE MATRIX') 

N0.-0 

DO 640 I 0>1. NOCHAN 
DO 640 IN<3l. 10 

ND'.'NO+l 

A<,^'^>^RV.^R(1P, 10, !N> 

CALL WRTMTXIA, NOCHAN, FROCAL. THREE. FETVC3) 

CONTINUE 

CALL TOPEFIIP, lER) 

DO 690 IX«3. 200 
IDRECI IX )>0 

CALL T0PWR112, ROO. lER. IDREC) 

IF! ICR NE 0) WRITE! 16, 234) lER 
IF! li-R GT 0) GO TO 310 
GO TO 3V0 

FORMAT (9X. 'ERROR IS'. 19) 


ERROR MESSAGES 


300 WRITE (6, 309) 

309 FORMAT <5X. 'ERROR -1') 

310 WRnE(6. 319) 

3ib F0RMAT!9X. 'ERROR GT I') 
320 STOP 
END 


nnrin nnnnnnnnn nnnnnnnnnnnnnnnnnctnnnrirtnnnnrtrtneinncinnrtnrutnnn 
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ORlGUmL 

OF POOR 



FILE HUGHES FORTRAN A LARS / PURDUE UNIVERSITY 




Mir.MEC: FORTRAN 

PRO i /.II TO CAI CULATF THE PRODAOLITY OF ERROR FOR TWO CLASSES 
PRf. ,HAM REQUIRES AB INPUT DECK IN THE READER FILE AS 

f-Ul.l (IW.i 

NUMBER OF TRAINING SAMPLES OF CLASS 1 


NUMBER OF TRAINING SAMPLES OF CLASS 2 


f !( ‘^',T CARD 
(FORMAT 13) 

- SrCONt) CARD 
(fOKMAT 13) 

> MeAN5> and COVARIANCE MATRICES OF CLASS I AND 2 IN 

LAR'jt'B FORMAT 

THE PRCJGHAM GIVES AS AN OUTPUT THE PRODABILITV OF CORRECT 
CLAO jlFlCATlON FOR EACH CHANNEL <FOR CHANIJCLl. CHANNEL 1.2 
CHAN .r.LB 1.2.3. ETC ). THE TRANSFORMATION MATRIX AND THE 
NEW MfAN AND COVARIANCE MAT^TICES 
THE PROGRAM REQUIRES THE FOLLOWING EXEC FILE 

GETDI&K IM3L 

GLOI’AL TXTlIB F0RTM002 CMSLID DIMSLIO SIMSLID 

LOAD HUGHES 

START 


LIST OF VARIABLES 

NT NUMBER OF TRAINING SAMPLES OF CLASS 1 
NS NUMBER OF TRAINING SAMPLES. OF CLASS 2 
EGVALl EIOENVALUE VECTOR OF Sj S. AFTER TRANSFORMATION 
DDl NEW MEAN VECTOR OF CLASS 1 
DDS NEW MEAN VECTOR, OF CLASS 2 
V50MA1 VARIANCE OF *•,: = VAR H<X/W.) 

VARIANCE OF i. »= VAR H<X/W.) 

TRANSFOkITA I ION MATRIX ‘ 
f.i W COVARIANCE MATRIX OP CLASS 1 
JEW COVARIANCE MATRIX OF CLASS. 2 


VBCM 
TRAN61 
r.Si t.Lw 
sssr-iW 


CONST. MULTIPLICATIVE FACTOR OF VAR ( (fj*) AND VAR <V) 

**«»4 «*•«•*#•««•*#»«. *•«*•»*«•**•« *••**#•»•«•****•*»•»•«• 

IMPLICIT REAL*B (A-M. 0-2) 

REAl 4>D SIGMAJ (78), SIGMA2(7B>, AINV<70), WK< JOO). PS1S2U2. 12), 

«WR( I6b), Ml < 12). M2C:2), PERROR. EGVECO( 12,, 1 ), EGVECTU. 12). CC( 1. 12). 
.fEGVAl R(24 ). EGVECR<rOn). SIGM1S< 12. 1 -? ) , AA( 1 , 1). DEGVEC( 12, 12), 

♦ EC.VALl (12:, BA1ACH( 12). T.’-.KP 1 < 12), DDF GVC (12. 12), MEAWN (2 ) . MEANSJ2). 
*SGMN(2), SaM3(2), GAMAR(2) . GAMAS (2), ALPHR < 2 ) , ALPHS < 2 ) , 

*CR(2), CS(2), A(2>, D(2), Dr.LTAR£2), DELTAS < 2 ). DIST < 2 ), ERROR <2) . 
*SS1N:-W!7U), S3?NEW(7a), ASCM5(2>. AS0i-.H(2>. 

VSIUII; set?, 12), DDl ( 12), I)D2< 12), TRAN J( 12, 12). TRAN51 ( 12, 12). 

• LAhi OA. U.,K<400>. MEANSl ( 12. 2). MEANRl ( 12, 2). SGM51 < 12, 2 ) . SGM)?1 ( 12, 2) 
»DS .’1 ( 12 ) . VSGMAJ?) 

CC)llPi e).«lA EGVALdO), EGVEC<12, 12), ZN. 

<.X1 , X2. D1 ( 12), D2( 12) 

EQUIVALENCE (EGVAL ( 1 ) , ECVALR < 1 ) ) . (EGVECd, D.EGVECRd)) 

*♦*■*•**■•»«*****■»*■•♦****♦•■••« **V-»*#,.*****tt** **«■♦•*«•**•■* ****###•# 

READ NUMBER Of TRAINING SAMPLES OF CLASS 1 AND 2 

RLAi) i.r.vj vector:: ur claoees i and .0 

READ COVARIANCE MATRICES OF CLASS 1 AND 2 


RFAP(5, 967)N1 
READ <5, 9,-,7)N2 
967 HTSMATdS) 

READ (D, 130)Ml 
READ (S, 130) M2 
REAIi < 1 , 130) SlCMAl 
Rt AD( S, 1 "U)SIGMA2 
i:,0 reSMAT (2X, 5 l14, 7) 

N =-• 1? 


*«-*»i**«-««»**IHt«-«**»**»»^>-»»*V**-«-****<.-*v*|.t»*#.*i, 4 I. « 

COMPUTE UAERBE U=^ COVARIANCE M.M F I X OF CLASS I 
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'OF 


FILE: HUGHES FORTRAN A LARS * PURDUE UNIVERSITY 


5 -*^(1 0/EGVALl(l>»*4)*(0 0/Nl ^B. 0/N2 0/ < Nl «ND > «40. 0/Nl«*S 

6 440 0/ND**2 44fl 0/NI«*3 <1H 0/N?4*3 4Mi! 0/(NI**C*N3) 

7 4012 0/(Nl«N2*»3) 41V20 0/(Hl«*2*N2**2> 4576 0/(NI««3»N31 

B 4076 0/<H24#3<ta) 42113 0/ <N1 *3 ) 42112 0/ (Nl**3*N2»*2> 

9 42D04 0/<N1**3#N2«43> 44 0»DS0R21 ( 1> * < 4 0/Nt 4B 0/N2 

• 40 0/Nl«*2 440 a/N2«*2 464. 0/(Nl«U2) 4?06. 0/<Nl4N2««P) 

• 4V6 0/«Nl*#r*N2> 442. 0/N2**3 420B 0/<Nl*N24*3) 4302 0/(NI#*2*N2 

• •«2> 43B4. 0/<Nt**2«N2**3) ) 44 0*D;>0H21 1 I >*«2*<2. 0/Nl 4B 0/N2 

• 440 0/N2«*2 424 0/(Nl*N2) 44B 0/N2«»3 4BB 0/<NI«N2*«2) 

• 496 0/<Nl«N2«»3>)>> 

VSCnA(2)«VSGHA(2>44 0«( (ECVALI ( I )»«4)*(B 0/NI *0 0/N2 

1 4I2Q. 0/(Nt«N2> 440. 0/Nt«*2 440 0/N2««2 440 0/NI««3 440 0/N2*«3 

2 4012. 0/«N1*42«N2> 4512 0/<N1«N2*#2> 41920. 0/<N1«»2«N2»#2) 

3 4576 0/<Nl**3fN2> 4576. 0/ <N1 »N2*#3) 

4 42112 0/(N1»<»3»N2»«2> 42H2 0/<Nl*«2«N2»*3> 42304 0/ (N1«43«N2**3 

5 >> 4<4. OcECVALl ( I )**3>*(DS0R2I< I >4(0 O/NI 44 0/N2 4U 0/N2*«2 

6 440. 0/N1442 464 0/(Nl«N2) 4256 0/<Nl**2«N2) 496. 0/ < N2*»2*N1 > 

7 440, 0/Nl«*3 420B. 0/<N2**3«»Nl) 4362. O/ <N14 42«N2«*2> 4 

B 334 0/<N2«*2«Nl4*3) ) -<4. 0/Nl 44. 0/N2 4H 0/Nl»*2 4B. 0/N2'/»2 
9 432 0/(N14N2> 440 0/(N1*N2«42> 44B. 0/ (N1*«2*N2) 

• 464 0,'CNI**2*N2««2)>) 4(2, 0«EGVALI( 1)4*2)#<4. 0/Nl 44. 0/N2 

• 4B 0/(Nl*H2) 4(2. 0*DS0R21<I)**2)*(0 O/Nl 42. 0/N2 440, 0/NI**2 

• 424. 0/(Nl*N2> 440. 0/Nl«*3 4B0. 0/ <N14*2*N2> 496. 0/ (N1**3#N2) ) 

• -4. 0«D50R21(I>*(2. 0/N2 44. 0/Nl 412. 0/ (N1*N2) 46. 0/Nl«*2 

• 416 0/(N14«2*N24*2) ) ) > 

DO 141 J^l.2 

IF(AIJ) GT. 0 0)G0 TO 979 

MEANS( J)*MEANS( J)4A( J>4( 1. 04B( J)#«P) 

SGMb< J)«5GnB( J)42. 0*( (A( J)4*2)*( 1. 042. 0*(B( J) )**2) > 

974 FORMAT! 1 OX, 'SGI1RB ■ '.F20. 4) 

C 

C**«*«*4»4*«4*««4**4«******»*4**««*4*4**6*****««««*««**»****« 

C CALCULATE MULTIPLICATIVE FACTOR AND NEW V AND 

C 

C»«**44»«444444«««»*««***»«**»««*«*»»**«*»**»**««»4****«*»»4* 

C 

979 XZX=CFLOAT(I> 

14'^ 5* ( X"'^*'*'') "•'* “*'2' 

143 ASOMG< J>=SCMS( J>+c6nStVd5QRT(VGCMA( J) ) 

192 ALPHG< J)»( lMEAN3( J)*»2)/ABUMS( JJ )-l. O 
1F(ALPH5(J) GE 0. 35)0AMAS( J>*!l, O 

CS( J) = (I1EAN5< J) 1-D.SGMl < (GAMAS(U)4l 0)«ASRMS(w’' / 

DEL7 AS ( J > =ASGMS ( J ) / ( MEANS { J > -CG ( J > > 

CS(J)=-MEANG(J>-D50RT( (CAMAS! J)4l. 0)«ASGN5( J) > 

1F(A(J) CT 0 OIGO TO 142 
GO 70 144 

142 MEANR< J)=MrANR( J)4A( J>*( 1. 04B! J)**2) 

SGlVv ( J)=SOMR( J)42. 0*( (A(0)*»2)»( 1. 042. 0*(B(J) )»#2) > 

B74 F0r<rlA7!30X, F20, 4) 

873 FnKr!A7( lOX, F20. 4) 

X2X-DFL0AT( 1 ) 

144 ASGMR! J)=SGMR(J)4C0NS74DSQRT(VSGMA( J) ) 

193 ALPH.H( J) = ( (MEANR( J)»42)/ASGMR( J) )-l, O 
IF<ALPHR ( J) , CE. 0 35 5GAMAR ( J ) = 1 . 0 

CR( J)=MEANR( Jl-DSQ.RT! (GA"'A.R(J)4l. 0)4ASCMR(J» ) 

DEU7 AR ( J > =ASGMR ( J » / ( MEANR ! J ) -CR ( J ) > 

141 CQN7INUE 
C 

* 

c 

C CALCULA7E PROBABILITY OF ERROR 
C 

C«444**« ••«•••«•••• **«44««4*4444«'ll«« »•«•••••• O UK *«•••• •■•«•<*•••• 

c 

PSI=P5I4DL0G<EGVAL1 ( I ) )4( (DD2( I )-DDl ( 1 > )«»2) / (EGVALl < I )-l 0) 

DO 145 J^-1. 2 

DIS7( J)=PSI-<CR(J)-CS(J) ) 

145 CONTINUE 

DO 14 6 K =1.2 

IF(D:C.T(K). LT, 0 0)G0 TO 147 
IF(DELTAR(K) EQ 0 0)00 TO 143 

ERn07 (K) = l 0-( (DELTAR(K)/(DCLTAR(K>4DELTAG(K) ) >** 

1 (GA.'.-S(h) + 1 0) Mm ( (DIBT(r< ) /DELTARU-.) ) + l, 04 ( ( GAMAR (K) 4 
rOANAS(K ) )4DEl.TA5(K) )/(DFLTAr<<K)4DELTAS(K) ) ) *4G,‘,r1AR < K > ) * 

Gor.yi (-di.st(K)/deltar<k) ) 

GO TO 146 
140 ERR0R<K)=1.0 

GO TO 146 



ORIGINAL PAGE n 
OF POOR QUALITY 


rut HUGHES rriURAN A CARS / PURDUi: uul‘‘ 'ISITV 


c 

CAll. LINVr'MHlOMAl.N. AINV, I DOT. EE 1 . EE3. UK. lER) 

WiJITEtA, n /> lEH 

It 7 fOKMATt' '. la) 

C 

c« ••••II ••••••••••*•••••• ••••••••• 

c ui covariance matrix t multiplied 

C UV CCIVA)n.. Al MAfKJX s 

•••••••••••••••• ••••••••••••••••••••••••••••••••••••••• 

c 

CALL VMUl.r.r.<AINV, S1GMA3. N. PSIS2. N) 

C 

••••••••••••••••••••• 

C COMPUTE CICrwVALUCS AND EIGENVECTORS OF (INVERSEt 

C SIGMAin (SiGMAO) 

c*« »•••••••••••• •••••••••••••••••••••••••••••••••••••••••• 

c 

CALL ElGRFtPSlS?. N. N, C. EGVALR. ECVECR. N. MR. lERR) 

WniTC(6, 117)1LRR 
WRITE (6, JCA)Um 1 ) 
iS6 FORMAT (' '.F6. l> 

C 



c normalizing EICfc'NVECTDRS (SEE FUKUNACA. 

C PACE as) 

••••••••••••••••• {.‘••••••••••••••••••* 

c 

CAM. VC VTSr( SIGMA 1. N. SI CM IS. N) 

CAU. VCVT‘.I (SIGMAa. N. SIGMPS. N) 

DU 10 I - l.N 

no ro j = 1 , N 

hCVtCTd.Jl n, DRf AL<EGVEC< J. I) ) 
t CVr.COl J, S )' DIU.AL<EOVEC< J. D) 

20 CONTINUE 
M « N 
NN = N 

CALL VMULFF(EGVECT. SIGMIS. l.M. NN. l.N. CC, 1. lEERI 
WilTEtA. 1?6) lEch 

CAUL VMv/Ur i C C C . EG VEw 0. l.M. TGI. l.N. Ar\. 1 . 1 1 lP I 
WRITE<6, 12<.) 1 ILR 
AA(1. 1 ) DSGRTIAAd. 1>) 

DO 30 K « 1 . N 

EGVLC(K. I) = EGVECIK. I)/AA(l, 1) 

30 CONTINUE 
10 CONTINUE 
C 

c«*^ »•»«••«••••••*•••••••• ••••••••••»••••*#•»••••••••••• 

c 

c^«« ••••••••••••••••••«•••••••••«•••••••••••••••»••••• ••»••• 

c CALCULATE NEW MEAN VECTOR DI •= EGVEC«NI 

•••••«••••»«•••••••*•••#•*••»•••••••••• 

c 

DU 90 I ^ I , N 

PI ( n - (o 0 , 0 0 ) 
i)j( n~( 0 , 0 . 0 . o) 

90 CONTINUE 

c 

c^« II II • II ••«•«•«••••««••«•••••«• 

C CAL(^ULATfc NEW M' AN VECTORS 

C«« 'll M'»•»«««•«•«•*••ft«»••••l)•«•. ••••«••••««••••• ••**sJ-*n*-*« • ••••••••• 

C 

DO 9ri I =1.N 
DO 9ti J =1.N 

DEGVEC(I.U) “ DREAL(EGVf.C< J. I ) ) 

Fi.-'.VlKl) - DRFAL(ECVAL < 1 ) > 

Dl.'rcvi. ! I, v'> EKt.NL<f GVrC( I, J) > 

1)11] )-' CVi J 1 J, n f Ml ( J> UM ( I ) 

D.’( ! ir EGvli-i J, 1 luMLM J>-*-D2< I ) 

Tti,‘.N ;i ( I I J) = 0 0 
95 CO'.; INUF. 

103 1 f.if-M (Of 14. •’) 

PQ T'77 1 .-l.N 

DPI ( I )=PP"aL(D! I ! ) ) 

n .'( P.R: A] <D2( I ) ) 

777 CC.^IINUE 

n 


c 


• • • • 


oooooon nnonnnn nrtnnnn nnnnn 
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OF POOR 


FILE; HUGHES FORTRAN A LARS / PURDUE UNIVERSITY 

ORDER THE EIGENVALUES AND EIGENVECTORS ACCORDING TO 
MAXI NUN EIGENVALUE 


qualitv 


DO IPO 1>1.N 
DO IPO J’°t.N 

IF(CCVALI<n-EGVALl(J))120. 120. 131 
131 TEMI»kECVAL1<1> 

TENPPcDDI ( I > 

TTENP=DD2<I) 

EGVALt ( I )-EGVALl (J> 

ODl ( 1 ic^ODl ( J) 

DDP( 1 )«DD2< J> 

ECVALl (U>=TFMP 
DDl < J)*TEMPP 
DD2< J)»TTEMP 
DO 13P K»1.N 
TEMPI <K>=DDEOVC<K. I) 

DDEOVCCK, I )=DDEGVC<K, J) 

DDEGVC<K, J)«TENP1(K) 

13P CONTINUE 
120 CONTINUE 


INITIALIZE ALL PARAMETERS UNDER CONSIDERATION 


URITE(6. 136) 

DO 134 I«1,N 
DO 134 J»1.N 
TRANS ( I . J > »DDEG VC ( J. I ) 
134 CONTINUE 

DO 135 II «:1.2 
MEANR( II )=0. O 
MEANS ( 1 1 )*0. 3 
SGNR< 1 n=o, 0 
GGMR( I n=0. O 
OANAR( II )«=0. 0 
GAMA5< II )=0. O 
ALPHIU II )=0. 0 
ALPHS( I n=o. O 
DELTARdD^O. O 
DELTAS(II)=0. 0 
CR< 1 1 >=^0, 0 
CB< 1 1 1=0. 0 
PSl=-0. 0 
VSGMAC II >=0. O 


135 


CONTINUE 


CALCULATE PARAMETERS OF GAMMA DISTRIBUTIONS 


» 


136 FCRMAT<' lOX. 'FIRST N DIMENSIONS'. lOX. 'PRODAUILITY OF ERROR') 
DO 140 I = l.N 
A(l) = l 0-1 0/EGVALim 
B( 1 ) = (D0l ( I )-DD2a ) >/(ECVALl ( 1 >-l, O) 

A<2)-FG'^ ALl ( I )-l 0 

B(2> = (D5QRT(EGVAL1(I ) >*<DD1 < I )-DD2( I ) >)/<EGVALl ( I )-l, 0) 

DSQRPl I >=(DD1 ( I >-DD2( I ) )**2 

CALCULATE VAR (V> AND VAR <t/i 

VSGMAI 1 )=VSGMA( 1 )-*4 0»( (2. O/EOVALl ( I )**P1»(4 0/Nl *4 0/N2 

1 fS 0/(Nl«.\.’)) -(4 C/EOVALl ( I)«*3) » (4 O.'N! >4 0/N2 ♦ 

2 0 0/Nl*.- -«!) ('M.S*!.;.' *33 0/<NI«r6"') -»4it 0/ <N1 *n:‘* » 

3+40 0/<Nl*«2 + NP) +-.4 0/ ( N1 **2* N.’ . ' + 4 Ofl' . 

4 (1.0/Nl ■*2 0/N2 +6 0/(Nl<N2) +4 0.tJP‘"P +B 0/ i N I *, *2 J ) > 



nnofinnnrj 


OF POOff Ql Ai ! " 
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FILH HJCtlEG FOHTRAN A LARS / PURDUE UNIVERSITY 


iP(nrLTAS(K) Eo o o)cn to 149 

ERROhIk )' ( (DCUTAGOO/IDl LTASlKI^Dn-TARIK) n»*<GAMAR<K)^I O) »• 
I M - ( (DIET <K » )/Da.TAb<K) )♦! 0*( <CAf!AR ( K H GAMAG ( K ) > «DELTAR <K ) ) / 
2<DCLTAR(K)+Dr,LI AS(K) ) )#«CAMAS(K) J*DEXP(UIST(K)/DELTAS<H) ) 

CO TO 146 
ERROR (Kl-O 0 
CONI INUL 

K*X=1 0-ERR0R<2> 

PERROR ■= O b«<l. O-ERR0R(l)^ERR0R(2>) 

PCC-1 0-PERRDR 
FORMAT <44X.rSO. 4 > 

WRITE<6. 150)1. PCC 
FO'^MAK' 16X. 12. 25X.F7. 9) 

FORMAT!' SX.FIO. 3. SX.FIO. 3) 

CONTINUE 
WRITEt6. 199) 

FORMAT!/) 

CONTINUE 

CONTINUE 


PRINT TRANSFORMATION MATRIX AND NEW MEAN AND COVARIANCE 
MATRICES 


URITEI6. 919) 

919 FORMAT! lOX, 'TRANSFORMATION VECTOR') 

WRITE! 6. 103) ( iTRANS! 1. J). J' I. N). 1 = 1. N) 

W'?ITL!6. 900) 

920 FORMAT!//) 
wruTE!6, 921 ) 

921 FORMAT! lOX, 'NEW MEAN VECTORS AND COVRIANCE MATRICES OF CLASS 1 
* AND 2') 

wn TE!6, 165) (DDl ( 1 ). I“l. N) 

UNITE! b> 160)!DD2!I). I«*1.N) 

165 FORMAT! 'MN'. 5E14. 7) 

DO 740 1=1. N 

DO 746 J-I.N 

IF! I NE J)GO TO 747 

SSI NEW I H-!I*II-1 ) )/2) = l. O 

SvOr.tW! 1 + ! I«( I-l ) )/2)=EGVALl (1) 

GO TO 746 

747 SSINKW! I + < J#( J-1 ? )/2>=0. 0 
SFI'NEW! I + ! J*! J-1 ) )/2)=0, 0 
746 CONTINUE 
740 CONTINUE 

NMN=N« !N-H ) /2 

WRIT£(6, 175) !SS1NEW! 1 >. 1 = 1. NMN) 

WRIVEI6. 175)!SS2NEW!I). 1=1. NMN) 

175 FORMAT! 'CV', 5E14. 7) 

430 STOP 
END 
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Appendix F 

Description of Data Sets For Experiments 
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OF POOU 


E * 


F.l Training and Test Fields for Aircraft, Simulated 
a Se t (Tape 203, file 3) 


Training Fields 


CLASS CORN 

RUN (71053900)# LINE (304< 326. 2). C0L(109. 133. 2) 
RUN (71053900). L1NE(512. 528. 1 ). C0L(B7. 93. 1 ) 
RUN(71053900). LINE(620. 636. 1).C0L(107. 123. 2) 
RUN( 71053900). LINE (656. 676. 2). C0L(33. 59. 2) 
CLASS FOREST 

RUN (71053900). LXNE(798. 812. 1). C0L(141. 161. 2) 
RUN (7 1053900). LINE (704. 720. 1).C0L(147. 155. 1) 
RUN( 71053900). LINE (726. 736. 1 ) . C0L(81. 95. 1) 


Test Fields 


(Also Area Classified) 


TEST CORN 

RUN (71053900). LINE ( 143. 154. 1 ). COL (42. 57. 1 ) 
RUN (7 1053900). LINE (305. 318. 1). C0L(116. 132. 1 ) 
RUN(71053900). LINE(403. 413. 1 ) . COL( 17. 33. 1) 
RUN (7 1053900). LINE (643. 657, 1).C0L(121. 127. 1) 
RUN (71053900). LINE (684. 691. D.COLdl. 30. 1) 
RUN( 71053900). LINE (857, 866. 1). COL (34. 53. 1) 
TEST FOREST 

RUN( 71053900), LINE (424. 430. 1). COL (161. 173. 1) 
RUN< 71053900 ). LiNE( 521. 531, 1), COL (142. 162. 1) 
RUN( 71053900), LINE (711. 728, 1), COL (149, 158. 1) 
RUN( 71053900). LINE (769. 779. 1).C0L(127. 148. 1) 
RUN (7 1053900). LINE (837. 851. 1).C0L(155. 162. 1) 
RUN (71053900). LI NE(923. 931. 1),C0L(70. 79, 1) 





* * 


! • 


OF FO 


163 


.2 Trnininf. anil Ti*st Fields for Aircraft, Real 
Data Set (Tape 20 3. file 1) 


training Fields 


CLASS CORN 
RUN (71053900). 
RUN (71053900). 
RUN (71 053900). 
RUN (7 1053900). 
CLASS FOREST 
RUN (7 1053900). 
RUN( 71053900). 
RUN (71053900). 


LINE(304. 326. 2).C0L(109. 133. 2) 
LINE(512, 528. 1), C0L(B7. 93. I )^ 
LINE (620. 636. 1). COL (107. 123.2) 
LINE<656. 676. 2). C0L(33. 59. 2) 


LINE (798. 812. 
LINE (704, 720. 
LINE <726. 736, 


1). C0L(141. 161. 2) 
1). C0L(147. 155. I ) 
1).C0L(81,95. 1) 


Test Fields (Also Area Classified) 


TEST CORN 
RUN (71053900), 
RUN< 71053900), 
RUN (71053900). 
RUN (7 1053900). 
RUN (71053900). 
RUN (71053900). 
TEST FOREST 
RUN (71053900). 
RUN (71053900). 
RUN (7 1053900). 
RUN (71053900), 
RUN (71053900), 
RUN (71053900). 


LINE <227. 
LINE <334. 
LINE <452. 
LINE <597, 
LINE <646. 
LINE<711, 

LINE<241. 
LINE <509, 
LINE <729, 
LINE <765. 
LINE (833. 
LINE <989. 


247. 1). C0L<81. 96. 1) 
351. 1). C0L<66. 100. 3) 
474. 2). C0L<108. 119. 1 ) 
611. 1). C0L(137. 153. 2) 
664. 1). COLdOl. 128. 2) 
721. 1). COL <102. 113. 1) 

249. 1). COL <27. 45. 1) 
527. 1). COLdBl. 193. 1) 
751.2),C0L<201.217. 1) 
803. 2).C0Ld91.203. 2) 
855, 2). C0Ld5l, 171. 2) 
1005. 1).C0L<141. 155.2) 
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QF POOR 





F. 3 


Training and Test Fields for Landsat, Multitemporal , 

u 1 jL.t cA i* ^ ^ ^ ^ 


Training Fields 


CLASS CORN 

78843016 

78843016 

78843016 

78843016 

CLASS SOYB 

78843016 

78843016 

78843016 


25 

32 

1 

33 

42 

62 

67 

1 

133 

141 

30 

33 

1 

87 

102 

91 

97 

1 

79 

86 

9 

12 

1 

61 

77 

74 

82 

1 

51 

64 

110 

117 

1 

167 

172 


1 

1 

I 

1 

1 

1 

1 


Test Fields (Also Area Classified) 


TEST CORN 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN(7SS43016). 
RUN (78843016). 
RUN (788430 16). 
TEST SOYBEANS 
RUN (788430 16). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN( 78843016). 
RUN (78843016). 
RUN (78843016). 


LINE(2. 12. 1) 
LINE (38. 46. 1 
LXNE(55. 58. 1 
LZNE(16. 22. 1 
LINE (70. 73. 1 
LINE(85. 93. 1 
LINE(102. 104 


.COL (30. 34. 1) 
).C0L(18.26. 1) 

). CDL(103. 117. 1) 
).C0L(123. 127. 1) 

). C0L(80. 89. 1) 
).C0L(47. 50. 1) 

1). COL (140. 155. 1) 


LINE(107. 115. 1). COLdl. 15. 1) 

LINE(1.4. 1).C0L(91. 100. 1) 
LINE(16. 20. 1). COL (56. 70. 1) 
LINE(32. 34. 1 ). C0L(114. 126. 1) 
LINE(49. 51. 1).C0L(113. 125. 1) 
LINE(76. 84. 1 ) . C0L(31. 40. 1) 
LINE (99. 106. 1). COL (127. 132. 1) 
LINE(106. 114. 1). C0L(53. 59. 1 ) 
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.A Training and Tost Fields for Landsat, Mul t i tempo ra 1 , 
Real Data Set ( Tape 203, file 5) 


Training Fields 


CLASS CORN 


78843016 

26 

78843016 

91 

78843016 

62 

78843016 

30 

CLASS SOYB 


78843016 

9 

7884 3C 16 

74 

78843016 

100 


Test Fields (Also A 


32 

1 32 

42 

1 

98 

1 79 

86 

1 

67 

1 134 

141 

1 

34 

1 91 

102 

1 

13 

1 68 

78 

1 

82 

1 51 

63 

1 

105 

1 120 

132 

1 


ea Class i f led ) 


TEST CORN 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
TEST SOYBEANS 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (788430 16). 
RUN (78843016). 


LXNE(2. 11. 1).C0L<27. 32. 1) 
LXNE(3B. 46. 1).C0L(19. 25. 1) 
LINE( 103. 106. 1). C0L(140. 156. 1 ) 
LXN£( 101. 115. 1). C0L(12. 17. 1 ) 
LXNE(78. 86. 1).C0L<124. 128. 1) 
LXNE(67. 74. 1 ) . C0L(94. 98. 1) 
LXNE(35. 41. 1 ). C0L(123. 127. 1 ) 

LXNE(41,44. 1).C0L(67. 79. 1) 
LXNE(79. 84. 1 ). C0L(31. 40. 1) 
LXNE(106. 114. 1).C0L(54. 59. 1) 
LXNE(44. 51. D.COLdlS. 123. 1) 
LXNE(1.4. 1).C0L(90. 100. 1) 
LXNE(109. 113. 1).C0L(132. 147. 1) 
LXNE(44. 47. 1 ). C0L<155. 161. 1 ) 
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F.5 Traininc and Test Fields for Aircraft Binary Tree 
Example (Tape 2 0 3 , f i 1 e 1 ) ^ 


Training Fields 


CLASS UIHTl 


71053900 

11 

626 

626 

1 162 

162 

INS- 

71053900 

12 

627 

627 

1 164 

164 

INS- 

71053900 

14 

628 

628 

1 159 

159 

INS- 

71053900 

16 

629 

629 

1 163 

163 

INS- 

71053900 

22 

635 

635 

1 167 

167 

INS- 

71053900 

3 

461 

461 

1 71 

71 

2NS- 

71053900 

4 

461 

461 

1 79 

79 

2NS- 

71053900 

9 

463 

463 

1 75 

75 

2NS- 

71053900 

4 

621 

621 

1 167 

167 

INS- 

71053900 

10 

624 

624 

1 159 

159 

INS- 

71053900 

20 

633 

633 

1 161 

161 

INS- 

71053900 

21 

634 

634 

1 163 

163 

INS- 

71053900 

27 

639 

639 

1 163 

163 

INS- 

CLASS WHT2 
71053900 

3 

314 

314 

1 163 

163 

INS- 

71053900 

6 

316 

316 

1 166 

166 

INS- 

71053900 

7 

317 

317 

1 159 

159 

INS- 

71053900 

8 

318 

318 

1 157 

157 

INS- 

71053900 

10 

319 

319 

1 157 

157 

INS- 

71053900 

17 

324 

324 

1 167 

167 

INS- 

71053900 

18 

325 

325 

1 165 

165 

INS- 

71053900 

21 

327 

327 

1 167 

167 

INS- 

71053900 

22 

328 

328 

1 158 

158 

INS- 

71053900 

7 

462 

462 

1 79 

79 

2NS- 

71053900 

10 

463 

463 

1 77 

77 

2NS- 

71053900 

17 

469 

469 

1 67 

67 

2NS- 

71053900 

21 

471 

471 

1 75 

75 

2N5- 

CLASS HAY 
71053900 

2 

484 

484 

1 55 

55 

2NS- 

71053900 

1 

880 

880 

1 132 

132 

INS- 

71053900 

3 

882 

882 

1 126 

126 

INS- 

71053900 

7 

883 

883 

1 126 

126 

INS- 

71053900 

14 

886 

886 

1 128 

128 

INS- 

71053900 

15 

887 

887 

1 133 

133 

INS- 

71053900 

18 

889 

889 

1 134 

134 

INS- 

71053900 

19 

890 

890 

1 135 

135 

INS- 

71053900 

20 

891 

891 

1 128 

128 

INS- 

71053900 

30 

895 

895 

1 132 

132 

INS- 

71053900 

13 

488 

488 

1 41 

41 

2NS- 

71053900 

16 

490 

490 

1 43 

43 

2NS- 

71053900 

19 

894 

894 

1 135 

135 

INS- 
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CLASS PASl 


71053900 

2 

402 

402 

1 

157 

157 

2NS- 

71053900 

32 

417 

417 

1 

153 

153 

2NS- 

71053900 

34 

418 

416 

1 

149 

149 

2N8- 

71053900 

1 

1012 

1012 

1 

101 

101 

1N5- 

71053900 

1 

1012 

1012 

i 

102 

102 

INB- 

71053900 

1 

1012 

1012 

1 

107 

107 

INS- 

71053900 

5 

1014 

1014 

1 

101 

101 

1N8> 

71053900 

6 

1015 

1015 

1 

103 

103 

INS- 

71053900 

7 

1016 

1016 

1 

102 

102 

INS- 

71053900 

10 

1017 

1017 

1 

113 

113 

INS- 

71053900 

10 

1017 

1017 

1 

115 

115 

INS- 

71053900 

12 

1018 

1016 

1 

112 

112 

INS- 

71053900 
CLASS PAS2 

15 

1020 

1020 

1 

107 

107 

INS- 

71053900 

0 

418 

416 

1 

147 

147 

2 

71053900 

0 

588 

588 

1 

67 

67 

2 

71053900 

0 

589 

589 

1 

65 

65 

2 

71053900 

0 

589 

589 

1 

67 

67 

2 

71053900 

0 

589 

589 

1 

69 

69 

2 

71053900 

0 

589 

569 

1 

75 

75 

2 

71053900 

0 

593 

593 

1 

71 

71 

2 

71053900 

0 

595 

595 

1 

61 

61 

2 

71053900 

0 

595 

595 

1 

71 

71 

2 

71053900 

0 

596 

596 

1 

57 

57 

2 

71053900 

0 

596 

596 

1 

59 

59 

2 

71053900 

0 

596 

596 

1 

67 

67 

2 

71053900 
CLASS SOY 

0 

597 

597 

1 

63 

63 

2 

71053900 

4 

424 

424 

2 

125 

125 

2NS- 

71053900 

3 

336 

336 

2 

165 

165 

2N8- 

71053900 

22 

352 

352 

2 

165 

165 

2NS- 

71053900 

1 

488 

488 

2 

123 

123 

2NS- 

71053900 

2 

488 

486 

2 

133 

133 

2NS- 

71053900 

22 

500 

500 

2 

127 

127 

2NS- 

71053900 

9 

312 

312 

2 

63 

63 

2NS- 

71053900 

10 

312 

312 

2 

67 

67 

2NS- 

71053900 

5 

424 

424 

2 

131 

131 

2NS- 

71053900 

7 

426 

426 

2 

113 

113 

2NS- 

71053900 

11 

426 

426 

2 

137 

137 

2NS- 

71053900 

41 

440 

440 

2 

137 

137 

2NS- 

71053900 
CLASS CRN 

23 

502 

502 

2 

119 

119 

2NS- 

71053900 

8 

516 

516 

1 

93 

93 

INS- 

71053900 

10 

518 

518 

1 

87 

87 

INS- 

71053900 

17 

521 

521 

1 

93 

93 

INS- 

71053900 

11 

623 

623 

1 

121 

121 

2NS- 

71053900 

71053900 

15 

625 

625 

1 

123 

123 

2NS- 

3 

656 

656 

2 

53 

53 

2NS- 

71053900 

23 

322 

322 

2 

119 

119 

2N8- 

71053900 

29 

326 

326 

2 

111 

111 

2NS- 

71053900 

19 

527 

527 

1 

90 

90 

INS- 

71053900 

8 

660 

660 

2 

35 

35 

2NS- 

71053900 

16 

664 

664 

2 

45 

45 

2NS- 

71053900 

24 

666 

668 

2 

55 

55 

2NS- 

71053900 
CLASS FST 

29 

672 

672 

2 

41 

41 

2NS- 

71053900 

11 

731 

731 

1 

85 

85 

INS- 

71053900 

13 

709 

709 

1 

154 

154 

INS- 

71053900 

17 

711 

711 

1 

151 

151 

INS- 

71053900 

32 

718 

718 

1 

147 

147 

INS- 

71053900 

3 

726 

726 

1 

90 

90 

INS- 

71053900 

4 

726 

726 

1 

95 

95 

INS- 

71053900 

27 

732 

732 

1 

95 

95 

INS- 

71053900 

32 

735 

735 

1 

82 

82 

INS- 

71053900 

15 

803 

803 

1 

149 

149 

2NS- 

71053900 

20 

805 

805 

1 

145 

145 

2NS- 

71053900 

30 

809 

809 

1 

141 

141 

2NS- 

71053900 

11 

709 

709 

1 

151 

151 

INS- 

71053900 

28 

718 

718 

1 

151 

151 

INS- 
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OF PLOn QUALITY 


CLASS WAT 
710S3900 

5 

BBS 

R8P 

1 165 

165 

1NS> 

71053900 

e 

891 

891 

1 162 

162 

INS- 

71053900 

9 

892 

892 

1 164 

164 

INS- 

71053900 

1 

936 

936 

1 139 

139 

INS- 

71053900 

3 

938 

938 

1 141 

141 

1NS> 

71053900 

3 

938 

938 

1 143 

143 

1NS> 

71053900 

6 

939 

939 

1 143 

143 

INS- 

71053900 

6 

939 

939 

1 146 

146 

INS- 

71053900 

8 

941 

941 

1 140 

140 

1NS> 

71053900 

10 

943 

943 

1 138 

138 

INS- 

71053900 

11 

944 

944 

1 140 

140 

INS- 

71053900 

14 

947 

947 

1 141 

141 

INS- 

71053900 

15 

948 

948 

1 141 

141 

INS- 


Test 

Fields 

( A 1 !U V 

Area 

Classified) 



TEST WHEAT 

71053900 


3C4 

312 

1 

155 

161 

1 

WHEATCUT 

71053900 

UU6 

839 

848 

1 

67 

70 

1 

WHEATCUT 

71053900 

U6 

854 

861 

1 

73 

77 

1 

WHEATCUT 

71053900 

UU7 

829 

851 

2 

73 

91 

2 

WHEAT 

71053900 

HH3 

619 

641 

2 

151 

161 

1 

WHEAT 

71053900 

0G2 

569 

575 

1 

145 

148 

1 

OATSCUT 

71053900 
TEST HAY 

FF9 

459 

475 

2 

81 

99 

1 

OATS 

71053900 

Z22 

873 

887 

1 

19 

67 

2 

HAY 

71053900 

L8 

899 

923 

2 

85 

99 

1 

HAY 

71053900 

C4 

252 

275 

2 

33 

35 

1 

HAY 

71053900 

02 

659 

661 

1 

92 

96 

1 

HAY 

71053900 

05 

713 

715 

1 

39 

50 

1 

HAY 

71053900 

CC2 

361 

387 

2 

155 

165 

1 

HAY 

71053900 BB9 
TEST PASTURE 

313 

327 

1 

173 

185 

1 

HAY 

71053900 

L2 

589 

599 

1 

77 

93 

1 

PASTURE 

71053900 

Z21 

1021 

1031 

1 

103 

117 

1 

PASTURE 

71053900 

01 

731 

743 

1 

31 

55 

2 

PASTURE 

71053900 

12 

669 

6/5 

1 

101 

123 

2 

PASTURE 

71053900 

T9 

1013 

1037 

2 

201 

211 

1 

PASTURE 

71053900 

HH9 

683 

693 

1 

97 

129 

2 

PASTURE 

71053900 

EE5 

421 

439 

2 

177 

191 

1 

PASTURE 

71053900 Z20 

TEST SOYBEANS 

423 

445 

2 

11 

27 

1 

PASTURE 

71053900 

DD6 

593 

613 

1 

101 

127 

2 

SOYBEANS 

71053900 

04 

649 

687 

2 

77 

83 

1 

SOYBEANS 

71053900 

RR2 

861 

867 

1 

123 

149 

2 

SOYBEANS 

71C .3900 

115 

649 

671 

2 

177 

191 

1 

SOYBEANS 

7105J900 

002 

479 

519 

2 

105 

139 

2 

SOYBEANS 

71053900 

R7 

449 

475 

2 

27 

55 

2 

SOYBEANS 

71053900 

Z9 

205 

231 

2 

195 

211 

2 

SOYBEANS 



TEST CORN 
71053900 

A3 

227 

247 

1 

81 

96 

1 

CORN 

71053900 

A5 

225 

247 

1 

49 

59 

1 

CORN 

71053900 

Cl 

283 

295 

1 

67 

95 

2 

CORN 

71053900 

F5 

374 

387 

1 

89 

99 

1 

CORN 

71053900 

DD3 

452 

474 

2 

108 

119 

1 

CORN 

71053900 

HHl 

597 

611 

1 

137 

153 

2 

CORN 

71053900 

JJl 

711 

721 

1 

102 

113 

1 

CORN 

71053700 

Z15 

481 

515 

2 

3 

21 

2 

CORN 

71053900 

F6 

373 

387 

1 

47 

79 

2 

CORN 

71053900 

Z16 

305 

327 

2 

191 

205 

1 

CORN 

TEST FOREST 








71053900 

AlO 

241 

249 

1 

27 

45 

2 

FOREST 

71053900 

Z6 

729 

751 

2 

201 

217 

2 

FOREST 

71053900 

Z3 

765 

803 

2 

191 

203 

1 

FOREST 

71053900 

EE4 

522 

525 

1 

155 

159 

1 

FOREST 

71053900 

RR4 

833 

855 

1 

151 

171 

1 

FOREST 

71053900 

HHIO 

765 

799 

2 

139 

159 

2 

FOREST 

71053900 

M3 

783 

795 

1 

49 

81 

2 

FOREST 

71053900 
TEST WATER 

Z18 

375 

387 

1 

191 

201 

1 

FOREST 

71053900 

A9 

205 

209 

1 

34 

38 

1 

PONDWATR 

71053900 

U2 

817 

819 

1 

49 

51 

1 

PONDWATR 

71053900 

A7 

221 

224 

1 

27 

29 

1 

PONDWATR 

71053900 

W3 

1000 

1004 

1 

51 

54 

1 

WATER 

71053900 

W2 

1010 

1014 

1 

36 

39 

1 

WATER 

71053900 

QQ7 

969 

973 

1 

/ J6 

Scot 

131 

1 

WATER 

71053900 

W7 

849 

855 

1 

205 

1 

WATER 

71053900 

W6 

873 

879 

1 

185 

191 

1 

WATER 

71053900 

W5 

977 

983 

1 

113 

119 

1 

WATER 

71053900 

W4 

1041 

1047 

1 

11 

15 

1 

WATER 


0 ;:.^ 
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F.6 Training and Test Fields for Landsat, Multitemporal 
Bin a rj; _T ree Example (Tape _20_3j fil.c 


Training Fields 


CLASS CORN 


78B430J6 

0 

28 

28 

1 

33 

33 

1 

7BB43016 

0 

29 

29 

1 

35 

35 

1 

7BB43016 

0 

30 

30 

1 

37 

37 

1 

7B843016 

0 

30 

30 

1 

42 

42 

1 

7BB43016 

0 

32 

32 

1 

34 

34 

1 

7B843016 

0 

32 

32 

1 

35 

35 

1 

7BB43016 

0 

32 

32 

1 

39 

39 

1 

7BB43016 

0 

64 

64 

1 

134 

134 

1 

7&S43016 

0 

64 

64 

1 

137 

137 

1 

7BB43016 

0 

65 

65 

1 

141 

141 

1 

78843016 

0 

30 

30 

1 

93 

93 

1 

78843016 

0 

30 

30 

1 

96 

96 

1 

78843016 
CLASS SOYBEANS 

0 

34 

34 

1 

102 

102 

1 

78843016 

0 

11 

11 

1 

69 

69 

1 

78843016 

0 

13 

13 

1 

72 

72 

1 

78843016 

0 

74 

74 

1 

57 

57 

1 

78843016 

0 

74 

74 

1 

63 

63 

1 

78843016 

0 

75 

75 

1 

52 

52 

1 

78843016 

0 

76 

76 

1 

56 

56 

1 

78843016 

0 

76 

76 

1 

61 

61 

1 

78843016 

0 

77 

77 

1 

53 

53 

1 

78843016 

0 

80 

80 

1 

60 

60 

1 

78843016 

0 

81 

81 

1 

59 

59 

1 

78843016 

0 

82 

82 

1 

58 

58 

1 

78843016 

0 

100 

100 

1 

125 

125 

1 

78843016 
CLASS ELSE 

0 

101 

101 

1 

130 

130 

1 

78843016 

0 

51 

51 

1 

154 

154 

1 

78843016 

0 

52 

52 

1 

154 

154 

1 

78843016 

0 

52 

52 

1 

160 

160 

1 

78843016 

0 

53 

53 

1 

158 

158 

1 

78843016 

0 

55 

55 

1 

161 

161 

1 

78843016 

0 

91 

91 

1 

180 

180 

1 

78843016 

0 

91 

91 

1 

182 

182 

1 

78843016 

0 

92 

92 

1 

177 

177 

1 

78843016 

0 

94 

94 

1 

178 

178 

1 

78843016 

0 

95 

95 

1 

188 

188 

1 

78843016 

0 

*32 

52 

1 

39 

39 

1 

78843016 

0 

1 

1 

1 

50 

50 

1 

78843016 

0 

7 

7 

1 

49 

49 

1 
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or I 



Test Fields (Also Area Classified) 


TEST CORN 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
RUN (78843016). 
TEST SOYBEANS 
RUN (78043016). 
RUN (78843016). 
RUN( 78843016). 
RUN (78843016). 
RUN( 78843016). 
RUN( 78843016). 
RUN( 78843016). 
TEST ELSE 
RUN( 78843016). 
RUN( 78843016). 
RUN (78843016). 
RUN( 78843016). 
RUN( 78843016). 


LINE(2. 11. 1).C0L(27. 32. 1 ) 
LINE(38. 46. 1 >. COL( 19. 25. 1) 
LINE(103. 106. 1). C0L(140. 156. 1 ) 
LiNEdOl. US. 1).C0L(12. 17. 1) 
L1NE(78. 86. 1).C0L(124. 128. 1) 
L1NE(67. 74. 1 ). C0L(94. 98. 1) 
LINE(35. 41. 1).C0L(123. 127. 1) 

LINE(41. 44. 1). C0LC67. 79. 1) 
L1NE(79. 84. 1 ) . C0L(31. 40. 1) 
LINE(106. 114. 1).C0L(54. 59. 1) 
LINE(44. 51. 1).C0L(118. 123. 1) 
LINE(1.4. 1).C0L(90. 100. 1) 
L1NE<109. 113. 1).C0L(132. 147. 1) 
LINE(44. 47. 1 ). COL( 155. 161. 1 ) 

LINE(33. 42. 1).C0L(137. 141. 1) 
LINE (54. 57. 1). COL (39. 52. 1) 
LINE(55. 59. 1).C0L(136. 149. 1) 
LINE(95. 109. 1).C0L(191. 194. 1) 
LINEdOB. 114. 1). C0L(8^. 89. 1 ) 



